Conducting a "Key Driver" analysis in RapidMiner

RSinclairRSinclair Member Posts: 4 Learner I
edited December 2018 in Help
I am looking for instruction/tutorial on how to go about conducting a "Key Driver" analysis using Rapidminer.   I  was told it could easily be done by RM team members at Wisdom 2018, but there is no instruction that I can find on the website that goes into any detail on how this type of analysis is done in RM.  Help is much appreciated.


Tagged:

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 862   Unicorn
    Hi @RSinclair,

    I just discovered on the web what a "Key Driver" analysis is.
    If I good understood it is a calculation of the correlation of different attributes to your target variable...
    So I propose to use the RapidMiner's Correlation Matrix operator. 
    If I misunderstood, thanks to correct me, and explain more explicitly what are your data and what you want to obtain.

    Regards,

    Lionel

    sgenzerMaerklimschmitz
  • RSinclairRSinclair Member Posts: 4 Learner I
    Hi Lionel,

    I apologize for the late acknowledgment of your response - I must have missed the email informing that someone had addressed my post.  To answer your question, it involves a bit more than a correlation matrix.

    Multiple linear regression is the most common technique to compute a Key Driver Analysis (KDA). Multiple linear regression analysis is one of the “workhorses” of multivariate analysis. It works by examining the correlations between independent variables to generate the best linear combination to predict the outcome variable. It provides a model “fit” using R-squared, which tells you how well the independent variables predict the dependent variable. For example, an R-squared value of .50 means the independent variables explain 50% of the variance in the dependent variable. 

    Thanks for the response,
    Robert

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    Hi,
    This is VERY close to what the operator "Explain Predictions" is doing.  To get those results, you could train a linear model first and then use Explain Predictions for this model which use the local correlations to show the contributions of the independent variables for the prediction of the dependent variable.  The process below shows a simple example for this.  If you go with a more complex or non-linear model you will actually see some more interesting results but the concept is the same...
    Hope this helps,
    Ingo

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          </operator>
          <operator activated="true" class="h2o:generalized_linear_model" compatibility="9.2.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="179" y="34">
            <parameter key="family" value="AUTO"/>
            <parameter key="link" value="family_default"/>
            <parameter key="solver" value="AUTO"/>
            <parameter key="reproducible" value="false"/>
            <parameter key="maximum_number_of_threads" value="4"/>
            <parameter key="use_regularization" value="true"/>
            <parameter key="lambda_search" value="false"/>
            <parameter key="number_of_lambdas" value="0"/>
            <parameter key="lambda_min_ratio" value="0.0"/>
            <parameter key="early_stopping" value="true"/>
            <parameter key="stopping_rounds" value="3"/>
            <parameter key="stopping_tolerance" value="0.001"/>
            <parameter key="standardize" value="true"/>
            <parameter key="non-negative_coefficients" value="false"/>
            <parameter key="add_intercept" value="true"/>
            <parameter key="compute_p-values" value="false"/>
            <parameter key="remove_collinear_columns" value="false"/>
            <parameter key="missing_values_handling" value="MeanImputation"/>
            <parameter key="max_iterations" value="0"/>
            <parameter key="specify_beta_constraints" value="false"/>
            <list key="beta_constraints"/>
            <parameter key="max_runtime_seconds" value="0"/>
            <list key="expert_parameters"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.2.000" expanded="true" height="103" name="Multiply" width="90" x="313" y="85"/>
          <operator activated="true" class="model_simulator:explain_predictions" compatibility="9.2.000" expanded="true" height="103" name="Explain Predictions" width="90" x="447" y="34">
            <parameter key="maximal explaining attributes" value="3"/>
            <parameter key="local sample size" value="500"/>
            <parameter key="only create predictions" value="false"/>
          </operator>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Generalized Linear Model" to_port="training set"/>
          <connect from_op="Generalized Linear Model" from_port="model" to_op="Explain Predictions" to_port="model"/>
          <connect from_op="Generalized Linear Model" from_port="exampleSet" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Explain Predictions" to_port="training data"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Explain Predictions" to_port="test data"/>
          <connect from_op="Explain Predictions" from_port="visualization output" to_port="result 1"/>
          <connect from_op="Explain Predictions" from_port="importances output" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>


    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    sgenzer
Sign In or Register to comment.