Explain Predictions table - Coloring explained

RSinclairRSinclair Member Posts: 4 Learner I
I cannot locate an explanation of the color coding RM applies to the table under "Explain Predictions" tab.

I would appreciate someone pointing me to an explanation of what these variations of red and green mean.

Best Answers

  • kypexinkypexin Posts: 247   Unicorn
    edited February 13 Solution Accepted
    Hi @RSinclair

    As per operator's help section: 

    "operator takes a model and an ExampleSet as input, and generates a table highlighting the attributes that most strongly support (green) or contradict (red) each prediction."

    So in your case, for example, CardType = '36 Credit Mag' supports prediction = No, while CardType = '65 Mag' contradicts it. This way you are getting the feeling which features and values play the largest role in each prediction.
  • mschmitzmschmitz Posts: 1,952  RM Data Scientist
    Solution Accepted
    Hi all,

    please keep in mind, that deep red means: An increase of the value, will (strongly) decrease your target variable.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany

Answers

  • varunm1varunm1 Member Posts: 199   Unicorn
    edited February 24
    @mschmitz ; @kypexin

    In case of classification, if my original label is 1 and the predicted value is 0 which is an incorrect prediction, does support predictors (Green) are related to the predicted value 0? Is it like these support predictors are making algorithm predict wrong class in my scenario?

    Thanks
    Regards,
    Varun
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,536  RM Founder
    In case of classification, if my original label is 1 and the predicted value is 0 which is an incorrect prediction, does support predictors (Green) are related to the predicted value 0?

    Yes, that is exactly right.  The terms "support" and "contradict" are always relative to the prediction of the model, independent of the fact if the prediction is correct or wrong.  This way, the explanations can also be created if the true class is not even known at all.

    Hope this helps,
    Ingo

    mschmitzvarunm1sgenzer
  • varunm1varunm1 Member Posts: 199   Unicorn
    Hi @mschmitz

    Are local correlations calculated by explaining predictions operator based on Pearson correlation?
    Regards,
    Varun
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,952  RM Data Scientist
    for numericals, yes. Otherwise - @IngoRM ?

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    varunm1
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,536  RM Founder
    edited February 28
    It is also the Pearson correlation where nominal values are recoded to 1 if there are the same value as the original and 0 otherwise.
    However, there are a couple of small tweaks we are using here to improve the robustness of the calculation, which let the values sometimes differ a bit from the standard Pearson calculation.
    From the docs in the code:
    * Calculate the correlation between the given attribute and the predictions.  Make sure that the
    * the predictions are set in a one-vs-all fashion for multiclass problems. It uses the confidence
    * for the class to correlated with.
    *
    * For nominal attributes we just into 1 (same value as the one predicted) vs. 0 (different value).
    *
    * Please note that this method artificially sets the standard deviation to a small value in case
    * of all labels being the same (which can happen if the model is really confident in certain
    * areas). First, we artificially change one random label in case they are all the same.
    * Then we also capture the case that the standard deviations are still 0 by replacing it
    * by a small value then.
    *
    * These small changes will avoid that all correlations for all attributes would be NaN
    * otherwise. Because of those changes, this method should not be used for calculating
    * regular correlations.

    varunm1mschmitzsgenzer
  • varunm1varunm1 Member Posts: 199   Unicorn
    Great, Thanks @IngoRM and @mschmitz
    Regards,
    Varun
  • varunm1varunm1 Member Posts: 199   Unicorn
    Hello @IngoRM

    In the explain prediction operators, is there a way to check the top 3 supporting predictors for correct predictions. Right now I am downloading data and doing some excel operations to find how many samples were predicted correctly and see which predictor supported most of the correct predictions. Similarly, for the incorrect predictions as well.

    Thanks for your support.
    Regards,
    Varun
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,536  RM Founder
    Not sure if I got you right there, but something like the process below?  I just delivered the text-based data from the Explain Predictions operator (second port) and added an additional Filter Examples afterwards.  You can also do more sophisticated stuff with the results from the third port and some Join magic...
    Let me know if that is what you had in mind.
    Cheers,
    Ingo
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="187">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          </operator>
          <operator activated="true" class="split_data" compatibility="9.2.000" expanded="true" height="103" name="Split Data" width="90" x="179" y="187">
            <enumeration key="partitions">
              <parameter key="ratio" value="0.7"/>
              <parameter key="ratio" value="0.3"/>
            </enumeration>
            <parameter key="sampling_type" value="automatic"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
          </operator>
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="313" y="34">
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <operator activated="true" class="model_simulator:explain_predictions" compatibility="9.2.000" expanded="true" height="103" name="Explain Predictions" width="90" x="514" y="187">
            <parameter key="maximal explaining attributes" value="3"/>
            <parameter key="local sample size" value="500"/>
            <parameter key="only create predictions" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="648" y="187">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="correct_predictions"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list"/>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Split Data" to_port="example set"/>
          <connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Split Data" from_port="partition 2" to_op="Explain Predictions" to_port="test data"/>
          <connect from_op="Decision Tree" from_port="model" to_op="Explain Predictions" to_port="model"/>
          <connect from_op="Decision Tree" from_port="exampleSet" to_op="Explain Predictions" to_port="training data"/>
          <connect from_op="Explain Predictions" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • varunm1varunm1 Member Posts: 199   Unicorn
    edited March 14
    @IngoRM thanks for this process. Is it possible to provide a ranking to supporting predictors over all the correct predictions?

    In the example you provided, my observation states that 'Sex' is the major supporting predictor in all these correct predictions. So, I can rank this as the best supporting predictor in this data set for this algorithm. Similarly the second best and third best over all correct predicted samples rather than an individual sample.

    Sorry if it's confusing.

    Regards,
    Varun
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,536  RM Founder
    edited March 14
    Sure, that's possible as well.  The process is below.  It would be possible to build a simpler process for this but this version produces a nicer output :p
    Hope this helps,
    Ingo
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="187">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          </operator>
          <operator activated="true" class="split_data" compatibility="9.2.000" expanded="true" height="103" name="Split Data" width="90" x="179" y="187">
            <enumeration key="partitions">
              <parameter key="ratio" value="0.7"/>
              <parameter key="ratio" value="0.3"/>
            </enumeration>
            <parameter key="sampling_type" value="automatic"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
          </operator>
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="313" y="34">
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <operator activated="true" class="model_simulator:explain_predictions" compatibility="9.2.000" expanded="true" height="103" name="Explain Predictions" width="90" x="514" y="238">
            <parameter key="maximal explaining attributes" value="3"/>
            <parameter key="local sample size" value="500"/>
            <parameter key="only create predictions" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="136">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="782" y="136">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="correct_predictions"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list"/>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="916" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000" expanded="true" height="82" name="Join" width="90" x="1050" y="238">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="left"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="id" value="Row No"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="1184" y="238">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="id|Name|Importance"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="blending:pivot" compatibility="9.2.000" expanded="true" height="82" name="Pivot" width="90" x="1318" y="238">
            <parameter key="group_by_attributes" value="id"/>
            <parameter key="column_grouping_attribute" value="Name"/>
            <list key="aggregation_attributes">
              <parameter key="Importance" value="average"/>
            </list>
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="default_aggregation_function" value="first"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="9.2.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="1452" y="238">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="replace_what" value="average\(Importance\)_(.*)"/>
            <parameter key="replace_by" value="$1"/>
          </operator>
          <operator activated="true" class="concurrency:loop_attributes" compatibility="9.2.000" expanded="true" height="82" name="Loop Attributes" width="90" x="1586" y="238">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="attribute_name_macro" value="loop_attribute"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="aggregate" compatibility="9.2.000" expanded="true" height="82" name="Aggregate" width="90" x="45" y="34">
                <parameter key="use_default_aggregation" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default_aggregation_function" value="average"/>
                <list key="aggregation_attributes">
                  <parameter key="%{loop_attribute}" value="average"/>
                </list>
                <parameter key="group_by_attributes" value=""/>
                <parameter key="count_all_combinations" value="false"/>
                <parameter key="only_distinct" value="false"/>
                <parameter key="ignore_missings" value="true"/>
              </operator>
              <operator activated="true" class="rename" compatibility="9.2.000" expanded="true" height="82" name="Rename" width="90" x="179" y="34">
                <parameter key="old_name" value="average(%{loop_attribute})"/>
                <parameter key="new_name" value="%{loop_attribute}"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="transpose" compatibility="9.2.000" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
              <connect from_port="input 1" to_op="Aggregate" to_port="example set input"/>
              <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Transpose" to_port="example set input"/>
              <connect from_op="Transpose" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="9.2.000" expanded="true" height="82" name="Append" width="90" x="1720" y="238">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.000" expanded="true" height="82" name="Rename (2)" width="90" x="1854" y="238">
            <parameter key="old_name" value="id"/>
            <parameter key="new_name" value="Attribute"/>
            <list key="rename_additional_attributes">
              <parameter key="att_1" value="Avg Importance"/>
            </list>
          </operator>
          <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort" width="90" x="1988" y="238">
            <parameter key="attribute_name" value="Avg Importance"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Split Data" to_port="example set"/>
          <connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Split Data" from_port="partition 2" to_op="Explain Predictions" to_port="test data"/>
          <connect from_op="Decision Tree" from_port="model" to_op="Explain Predictions" to_port="model"/>
          <connect from_op="Decision Tree" from_port="exampleSet" to_op="Explain Predictions" to_port="training data"/>
          <connect from_op="Explain Predictions" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Explain Predictions" from_port="importances output" to_op="Join" to_port="right"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Pivot" to_port="input"/>
          <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
          <connect from_op="Loop Attributes" from_port="output 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Sort" to_port="example set input"/>
          <connect from_op="Sort" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • varunm1varunm1 Member Posts: 199   Unicorn
    Thanks a lot @IngoRM this helps a lot. I need to play on different datasets now.
    Regards,
    Varun
Sign In or Register to comment.