RapidMiner

The new Get Local Interpretation Operator...

RM Certified Analyst
RM Certified Analyst

Re: The new Get Local Interpretation Operator...

Hallo Martin:

 

Thank you for your latest message - I will look at the process you shared.  I checked my private messages (in my profile) re the GLI operator, there is one private message from you, but it is a message from last summer - perhaps I am looking in the wrong place for the new message?

 

Please let me lknow where I should also look when you get a chance.

 

MfG, and thanks,

 

Michael ;-)

RM Certified Analyst
RM Certified Analyst

Re: The new Get Local Interpretation Operator...

Hallo Martin:

 

Thank you for your reply - will look at the process you sent. ;-)

 

I did check private messages in my profile - there is a message from you from last summer, but no new message re: the GLI operator. 

 

Is there another place on the web site where I should check for private messages?  

 

Thanks for letting me know when you get a chance.

 

MfG, Michael ;-)

RM Certified Analyst
RM Certified Analyst

Re: The new Get Local Interpretation Operator...

Hallo nacheinmal, Martin:

 

Thank you for your process which I have looked at, and yes, there are no problem with negative values.  I need to look back at the process I did to make sure I am doing things correctly - I very well may have screwed it up.  

 

Look forward to working with the new GLI operator, as it is very important and meets a real need.  Thanks for all of the time you spent developing it.   ;-)

 

MfG, Michael

RM Certified Analyst
RM Certified Analyst

Re: The new Get Local Interpretation Operator....

Colleauges:

 

I am now using version 0.5.1 of the Get Local Interpretation operstor, which allows Grouped Models to be input to the incoming mod port of this operator - works great.  I

 

have put an Optimize Parameters Grid operator inside the GLI operator in order to optimize the model that generates the Interpretations.

 

The paramters I set up for a Decision Tree within the Optimize Paramters Grid involves about 5900 combinations of parameter settings.  When I run the process, I notice that Optimize Paramters operator within the GLI operator runs multiple times - as if its running within a loop.  The result is that it takes Optimize Parameters several hours to run before GLI can finally deliver the Interpretation.  I'm now at 6 iterations of the Optimize Parameters operator and counting - which adds up to approximately 30,000 combinations being tested insteas of the original (approximately) 5900.

 

Can anyone explain why this is happening?  Is there a parameter I can set that controls how many times Optimize Parameters Griud will loop within a Get Local Interpretation Operator?

 

Thanks for considering this, and best wishes, Michael

 

Highlighted
RM Staff
RM Staff

Re: The new Get Local Interpretation Operator....

Michael,

 

the reason for is is, that GLI is a loop. In order to get a Local Interpretation we build one model for every data point. Think about a local model as a local approximation, similar to a taylor expansion around a point. This local model needs to be calculated for every data point in your example set. That's why you calculate #examples*#optimization_steps decision trees.

 

There are two features i like to implement to do it faster:

  1. Ability to run it on small samples. At the moment you need to connect a bigger example set to the GLI to make it possible to normalize correctly. I was thinking to fix this as next feature
  2. Parallel Computation. But therefore i would need some help from our dev department.

I will share a slide deck with more details privately. I think that will help. I need to turn this one into a blog post / video at some point

 

Best,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
RM Certified Analyst
RM Certified Analyst

Re: The new Get Local Interpretation Operator....

Hi Martin - thank you for your reply which clears up some questions - and thanks for sending along the slides when you have a chance.  I do have one remaining question at this point.  I am feeding the GLI operator a grouped model.  The grouped model contains a normalization operator.  I am generating predicitions for data the model hasn't seen yet and then feeing the model and the predictions to the GLI operator.  

 

Since the grouped model contains a normalize operation, the (labeled) data coming into the GLI operator has already been normalized.  When I look at the decision tree path generated by the GLI operator (took 7 hours and 43 minutes) 

 

Should I feed a copy of the new data that has not yet been normalized into the GLI operator (i.e. the same data that was fed into the model to generate predictions)  anong with the grouped model?   

 

I ask as when I look at the decision tree path, the values in the text string decscribing the path do not map to the values of the data fields the decision tree path is describing.  I am probably doing someting wrong.  

 

Attached are three files that show the process:  01_GLI_Top_Level_Process.jpg shows the process at the top level, 02_GLI_Process_Middle_Level.jpg shows the process at the middle level, and 03_GLI_Process_Inner_Level_.jpg shows the process inner layer.

 

You will see in the "middle level" screen shot that I used the Remember operator to Remember the optimal paramter settings from Optimize Parameters Grid and then Recall them as none of the three outgoing ports of the GLI operator would accept the Parameters.  If I am mistakern about this point, please correct me. 

 

Thanks for considering this when you have a chance, and I look forward to seeing the slide deck you mentioned.  The GLI operator is a really important one, and I look forward to using it often - I just need to use it correctly!

 

Best wishes, Michael

RM Certified Analyst
RM Certified Analyst

Re: The new Get Local Interpretation Operator....

Hi again, Martin:   Just realized how silly my suggestion about reading in the new data a second time was (haven't had my first cup of coffee yet) - as I think the GLI operator needs the predictions made be the grouped model in order to correlate field values to the prediction for each data row.  Therefore it looks like I have to run a query that merges the predictions made by vthe grouped model with the non-normalized data the grouped model used to make the predictions, and then feed that dataset into the GLI operator.  Does this make sense?  Will try and let you know.    MfG, Michael

RM Staff
RM Staff

Re: The new Get Local Interpretation Operator....

Hi,

 

i think you found a bug in my quick fix. I guess i use the grouped model in every iteration. So in the second iteration you might have a 2x normalized data set.

 

I am Out of Office next week. So this sadly needs to wait one week.

 

Cheers,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
RM Certified Analyst
RM Certified Analyst

Re: The new Get Local Interpretation Operator....

Hallo Martin:

 

Kein Problem - Ich bin grundsaetzlich ein geduldlicher Mensch..... ;-)

In the mean time, I rigged up a process that feeds the grouped model predictions and the de-normalized original data that was fed into the grouped model to generate the predictions to GLI operator and am checking the results - we can discuss after you're back and settled. 

 

Thanks for all of the correspondance, will review the slides cartefully, and wish you a great week (Ich hofe im Urlaub!).

 

MfG,

 

Michael ;-)

RM Staff
RM Staff

Re: The new Get Local Interpretation Operator....

Hi @M_Martin,

 

i've checked the issue. If you use a grouped model which includes a normalization, the trees will always have their cuts in this normalized space.

This is somewhat what i would expect in this case. It's kind of hard to not do this.

 

If you want to have the tree on a denormalized ExampleSet you can of course get the de-normalization model into GLI. Afterwards you can then denormalize right in front of the DT. See attached process.

 

I could simply add a new input port for additional data to avoid the Remember/Recall shenanigans. Would this work for you?

 

Best,

Martin

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="7.6.001" expanded="true" height="103" name="Normalize" width="90" x="179" y="34"/>
      <operator activated="true" class="denormalize" compatibility="7.6.001" expanded="true" height="82" name="De-Normalize" width="90" x="246" y="238"/>
      <operator activated="true" class="remember" compatibility="7.6.001" expanded="true" height="68" name="Remember" width="90" x="380" y="187">
        <parameter key="name" value="deNorm"/>
        <parameter key="io_object" value="Model"/>
      </operator>
      <operator activated="true" class="h2o:gradient_boosted_trees" compatibility="7.6.001" expanded="true" height="103" name="Gradient Boosted Trees" width="90" x="380" y="34">
        <parameter key="reproducible" value="true"/>
        <list key="expert_parameters"/>
      </operator>
      <operator activated="true" class="group_models" compatibility="7.6.001" expanded="true" height="124" name="Group Models" width="90" x="581" y="238"/>
      <operator activated="true" class="operator_toolbox:get_interpretation_subprocess" compatibility="0.5.001" expanded="true" height="124" name="Get Local Interpretation" width="90" x="782" y="34">
        <process expanded="true">
          <operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply" width="90" x="45" y="34"/>
          <operator activated="true" class="weight_by_gini_index" compatibility="7.6.001" expanded="true" height="82" name="Weight by Gini Index (2)" width="90" x="447" y="34"/>
          <operator activated="true" class="recall" compatibility="7.6.001" expanded="true" height="68" name="Recall" width="90" x="45" y="187">
            <parameter key="name" value="deNorm"/>
            <parameter key="io_object" value="Model"/>
            <parameter key="remove_from_store" value="false"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="246" y="187">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.6.001" expanded="true" height="82" name="Decision Tree" width="90" x="447" y="187"/>
          <connect from_port="training set" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Weight by Gini Index (2)" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Weight by Gini Index (2)" from_port="weights" to_port="Weight Vector"/>
          <connect from_op="Recall" from_port="result" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="Prediction Model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_Weight Vector" spacing="0"/>
          <portSpacing port="sink_Prediction Model" spacing="0"/>
          <portSpacing port="sink_Performance Vector" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Gradient Boosted Trees" to_port="training set"/>
      <connect from_op="Normalize" from_port="preprocessing model" to_op="De-Normalize" to_port="model input"/>
      <connect from_op="De-Normalize" from_port="model output" to_op="Remember" to_port="store"/>
      <connect from_op="De-Normalize" from_port="original model output" to_op="Group Models" to_port="models in 1"/>
      <connect from_op="Gradient Boosted Trees" from_port="model" to_op="Group Models" to_port="models in 2"/>
      <connect from_op="Gradient Boosted Trees" from_port="exampleSet" to_op="Get Local Interpretation" to_port="exa"/>
      <connect from_op="Group Models" from_port="model out" to_op="Get Local Interpretation" to_port="mod"/>
      <connect from_op="Get Local Interpretation" from_port="exa" to_port="result 1"/>
      <connect from_op="Get Local Interpretation" from_port="mod" to_port="result 2"/>
      <connect from_op="Get Local Interpretation" from_port="wei" to_port="result 3"/>
      <connect from_op="Get Local Interpretation" from_port="loc" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>
--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner