RapidMiner

RapidMiner

Sensitivity Analysis for Predictive Models

Regular Contributor

Sensitivity Analysis for Predictive Models

Many of the commercial tools have sensitivity analysis features for a variety of predictive model types. For instance IBM SPSS Modeler has it for all predictive models. The purpose is to identify relative contribution of each independent variable. Usually applies to machine learning techniques. Is there an ad on that does this?  

WHEN if ever we will have sensitivity analysis capabilities in RapidMiner. SA is the only way yo shed some light into black-box of machine learning.

Any advise?

DDelen
12 REPLIES
Super Contributor

Re: Sensitivity Analysis for Predictive Models

You can e.g. use the Loop Attributes operator and perform a X-Validation on each attribute.
There are also some operators that use statistical methods to estimate the predicitve power of a feature. Search for "Weight by" in the operator tree to find all methods.

Best regards,
Marius
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

Hi Marius,

Your suggestion "...use the Loop Attributes operator and perform a X-Validation.." is intriguing... Have you ever done it. I tried to set it up with Iris data set. I could not make it work. I would appreciate it if you can help me set it up... You can either post it here or send me an email at ddelen@gmail.com. What would help more than anything else is a process that uses iris to do just that. THANK YOU!

-Delen
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

I can make it work if you tell me more clearly your goal.

I'm familiar with SPSS and SAS tools, so you can describe by relating to their features.

I'm afraid you will end up with a simple correlation score between all features.
Some better ideas will be small trees of small feature subsets.
For example, if you have an XOR relation, you will have 0 correlation.
I.e. y = a XOR b, then correlation(a,y) = 0.
If you measure accuracy(y=tree(a,b)) you get better results.

So my recommendation is, go with feature subset selection, and measure performance of all subsets.
I can make this for you if this is what you want.
But maybe you want something completely different.

Best regards,

Wessel
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

Thank you for replying.

What I am trying to do is called Sensitivity Analysis in SPSS and SAS. It applies to machine learning techniques where the contribution of each independent variable is assesses (and eventually rank ordered). The mechanics of is is as followed. Independent variable. one at a time, excluded from the input variable list and the accuracy of the classifier for each these models are measured. Based on the degradation of prediction accuracy corresponding to exclusion of each variable, its contribution to the classifier is judged. Once this is done for each variable, their rank ordered (relative) importance values are tabulated.

I hope this helps.

Delen
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

Result:

http://i.snag.gy/MAr2g.jpg


Process in next post.
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.013" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="optimize_selection_forward" compatibility="5.3.013" expanded="true" height="94" name="Forward Selection" width="90" x="180" y="30">
        <parameter key="maximal_number_of_attributes" value="33"/>
        <parameter key="speculative_rounds" value="55"/>
        <process expanded="true">
          <operator activated="true" class="x_validation" compatibility="5.3.013" expanded="true" height="112" name="InsV" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="weka:W-J48" compatibility="5.3.001" expanded="true" height="76" name="W-J48" width="90" x="45" y="165"/>
              <connect from_port="training" to_op="W-J48" to_port="training set"/>
              <connect from_op="W-J48" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="5.3.013" expanded="true" height="76" name="InsA" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" compatibility="5.3.013" expanded="true" height="76" name="Performance (2)" width="90" x="179" y="120">
                <parameter key="accuracy" value="false"/>
                <parameter key="kappa" value="true"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="InsA" to_port="model"/>
              <connect from_port="test set" to_op="InsA" to_port="unlabelled data"/>
              <connect from_op="InsA" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="log" compatibility="5.3.013" expanded="true" height="76" name="Log" width="90" x="180" y="30">
            <list key="log">
              <parameter key="f" value="operator.Forward Selection.value.feature_names"/>
              <parameter key="p" value="operator.InsV.value.performance"/>
              <parameter key="d" value="operator.InsV.value.deviation"/>
              <parameter key="c" value="operator.InsV.value.cpu-execution-time"/>
              <parameter key="a" value="operator.InsV.value.applycount"/>
              <parameter key="n" value="operator.Forward Selection.value.number of attributes"/>
            </list>
          </operator>
          <connect from_port="example set" to_op="InsV" to_port="training"/>
          <connect from_op="InsV" from_port="averagable 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="performance"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_by_weights" compatibility="5.3.013" expanded="true" height="94" name="Select by Weights" width="90" x="315" y="30"/>
      <operator activated="true" class="x_validation" compatibility="5.3.013" expanded="true" height="112" name="Validation" width="90" x="450" y="30">
        <process expanded="true">
          <operator activated="true" class="weka:W-J48" compatibility="5.3.001" expanded="true" height="76" name="W-J48 (2)" width="90" x="45" y="30"/>
          <connect from_port="training" to_op="W-J48 (2)" to_port="training set"/>
          <connect from_op="W-J48 (2)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="5.3.013" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.3.013" expanded="true" height="76" name="Performance (3)" width="90" x="179" y="165">
            <parameter key="accuracy" value="false"/>
            <parameter key="kappa" value="true"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
          <connect from_op="Performance (3)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="log_to_data" compatibility="5.3.013" expanded="true" height="94" name="Log to Data" width="90" x="447" y="210">
        <parameter key="log_name" value="Log"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Forward Selection" to_port="example set"/>
      <connect from_op="Forward Selection" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
      <connect from_op="Forward Selection" from_port="attribute weights" to_op="Select by Weights" to_port="weights"/>
      <connect from_op="Select by Weights" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Select by Weights" from_port="original" to_op="Log to Data" to_port="through 1"/>
      <connect from_op="Select by Weights" from_port="weights" to_port="result 3"/>
      <connect from_op="Validation" from_port="model" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <connect from_op="Log to Data" from_port="exampleSet" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

This is great! Thank you. Can you please answer a few questions regarding your process:
1. What are the exact meaning of p, d, c, a, n
2. Why did you not stop the process at Forward selection node? Wouldn't you get everything you need at that point?
3. Can we do the same thing with Loop Attribute Subset? (as was suggested by the moderator in response to my initial inquiry).

I really appreciate you helping me on this.

-Delen
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

Q1

p = performance
d = deviation of performance
c = run time in miliseconds
a = iteration counter (not very informative)
n = number of attributes

Look at the log operator, then you can see how this gets created.

Q3, yes you can do this with loop attributes

Q2, because looking at single attributes is not informative.
You must always look at small attribute subsets.

Like I said before, its possible for an attribute to have 0 correlation and still be required for accurate classification.
E.g. some non linear response effect, where averaged over the entire population there is no effect, only after including another attribute, which splits on specific group, where there is an effect, you see increased performance.
Relations like this are extremely common, especially for data which has a large number attributes.
Regular Contributor

Re: Sensitivity Analysis for Predictive Models

If you want something that is very fast, take a look at:



Performance (CFS) (Weka)

Synopsis
Calculates a performance measure based on the Correlation (filter evaluation).

Description
CFS attribute subset evaluator. For more information see: Hall, M. A. (1998). Correlation-based Feature Subset Selection for Machine Learning. Thesis submitted in partial fulfilment of the requirements of the degree of Doctor of Philosophy at the University of Waikato.
This operator creates a filter based performance measure for a feature subset. It evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them. Subsets of features that are highly correlated with the class while having low intercorrelation are preferred.
This operator can be applied on both numerical and nominal data sets.

Input
example set: expects: ExampleSet

Output
performance:
example set:

Parameters