Sensitivity Analysis for Predictive Models

DDelen · September 2013

Many of the commercial tools have sensitivity analysis features for a variety of predictive model types. For instance IBM SPSS Modeler has it for all predictive models. The purpose is to identify relative contribution of each independent variable. Usually applies to machine learning techniques. Is there an ad on that does this?

WHEN if ever we will have sensitivity analysis capabilities in RapidMiner. SA is the only way yo shed some light into black-box of machine learning.

Any advise?

DDelen

MariusHelf · September 2013

You can e.g. use the Loop Attributes operator and perform a X-Validation on each attribute.
There are also some operators that use statistical methods to estimate the predicitve power of a feature. Search for "Weight by" in the operator tree to find all methods.

Best regards,
Marius

DDelen · September 2013

Hi Marius,

Your suggestion "...use the Loop Attributes operator and perform a X-Validation.." is intriguing... Have you ever done it. I tried to set it up with Iris data set. I could not make it work. I would appreciate it if you can help me set it up... You can either post it here or send me an email at ddelen@gmail.com. What would help more than anything else is a process that uses iris to do just that. THANK YOU!

-Delen

wessel · September 2013

I can make it work if you tell me more clearly your goal.

I'm familiar with SPSS and SAS tools, so you can describe by relating to their features.

I'm afraid you will end up with a simple correlation score between all features.
Some better ideas will be small trees of small feature subsets.
For example, if you have an XOR relation, you will have 0 correlation.
I.e. y = a XOR b, then correlation(a,y) = 0.
If you measure accuracy(y=tree(a,b)) you get better results.

So my recommendation is, go with feature subset selection, and measure performance of all subsets.
I can make this for you if this is what you want.
But maybe you want something completely different.

Best regards,

Wessel

DDelen · September 2013

Thank you for replying.

What I am trying to do is called Sensitivity Analysis in SPSS and SAS. It applies to machine learning techniques where the contribution of each independent variable is assesses (and eventually rank ordered). The mechanics of is is as followed. Independent variable. one at a time, excluded from the input variable list and the accuracy of the classifier for each these models are measured. Based on the degradation of prediction accuracy corresponding to exclusion of each variable, its contribution to the classifier is judged. Once this is done for each variable, their rank ordered (relative) importance values are tabulated.

I hope this helps.

Delen

wessel · September 2013

Result:

http://i.snag.gy/MAr2g.jpg

Process in next post.

wessel · September 2013

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.013" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="optimize_selection_forward" compatibility="5.3.013" expanded="true" height="94" name="Forward Selection" width="90" x="180" y="30">
<parameter key="maximal_number_of_attributes" value="33"/>
<parameter key="speculative_rounds" value="55"/>
<process expanded="true">
<operator activated="true" class="x_validation" compatibility="5.3.013" expanded="true" height="112" name="InsV" width="90" x="45" y="30">
<process expanded="true">
<operator activated="true" class="weka:W-J48" compatibility="5.3.001" expanded="true" height="76" name="W-J48" width="90" x="45" y="165"/>
<connect from_port="training" to_op="W-J48" to_port="training set"/>
<connect from_op="W-J48" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.013" expanded="true" height="76" name="InsA" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.3.013" expanded="true" height="76" name="Performance (2)" width="90" x="179" y="120">
<parameter key="accuracy" value="false"/>
<parameter key="kappa" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="InsA" to_port="model"/>
<connect from_port="test set" to_op="InsA" to_port="unlabelled data"/>
<connect from_op="InsA" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="5.3.013" expanded="true" height="76" name="Log" width="90" x="180" y="30">
<list key="log">
<parameter key="f" value="operator.Forward Selection.value.feature_names"/>
<parameter key="p" value="operator.InsV.value.performance"/>
<parameter key="d" value="operator.InsV.value.deviation"/>
<parameter key="c" value="operator.InsV.value.cpu-execution-time"/>
<parameter key="a" value="operator.InsV.value.applycount"/>
<parameter key="n" value="operator.Forward Selection.value.number of attributes"/>
</list>
</operator>
<connect from_port="example set" to_op="InsV" to_port="training"/>
<connect from_op="InsV" from_port="averagable 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select_by_weights" compatibility="5.3.013" expanded="true" height="94" name="Select by Weights" width="90" x="315" y="30"/>
<operator activated="true" class="x_validation" compatibility="5.3.013" expanded="true" height="112" name="Validation" width="90" x="450" y="30">
<process expanded="true">
<operator activated="true" class="weka:W-J48" compatibility="5.3.001" expanded="true" height="76" name="W-J48 (2)" width="90" x="45" y="30"/>
<connect from_port="training" to_op="W-J48 (2)" to_port="training set"/>
<connect from_op="W-J48 (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.013" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.3.013" expanded="true" height="76" name="Performance (3)" width="90" x="179" y="165">
<parameter key="accuracy" value="false"/>
<parameter key="kappa" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
<connect from_op="Performance (3)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" compatibility="5.3.013" expanded="true" height="94" name="Log to Data" width="90" x="447" y="210">
<parameter key="log_name" value="Log"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Forward Selection" to_port="example set"/>
<connect from_op="Forward Selection" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
<connect from_op="Forward Selection" from_port="attribute weights" to_op="Select by Weights" to_port="weights"/>
<connect from_op="Select by Weights" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Select by Weights" from_port="original" to_op="Log to Data" to_port="through 1"/>
<connect from_op="Select by Weights" from_port="weights" to_port="result 3"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<connect from_op="Log to Data" from_port="exampleSet" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>

DDelen · October 2013

This is great! Thank you. Can you please answer a few questions regarding your process:
1. What are the exact meaning of p, d, c, a, n
2. Why did you not stop the process at Forward selection node? Wouldn't you get everything you need at that point?
3. Can we do the same thing with Loop Attribute Subset? (as was suggested by the moderator in response to my initial inquiry).

I really appreciate you helping me on this.

-Delen

wessel · October 2013

Q1

p = performance
d = deviation of performance
c = run time in miliseconds
a = iteration counter (not very informative)
n = number of attributes

Look at the log operator, then you can see how this gets created.

Q3, yes you can do this with loop attributes

Q2, because looking at single attributes is not informative.
You must always look at small attribute subsets.

Like I said before, its possible for an attribute to have 0 correlation and still be required for accurate classification.
E.g. some non linear response effect, where averaged over the entire population there is no effect, only after including another attribute, which splits on specific group, where there is an effect, you see increased performance.
Relations like this are extremely common, especially for data which has a large number attributes.

wessel · October 2013

If you want something that is very fast, take a look at:

Performance (CFS) (Weka)

Synopsis
Calculates a performance measure based on the Correlation (filter evaluation).

Description
CFS attribute subset evaluator. For more information see: Hall, M. A. (1998). Correlation-based Feature Subset Selection for Machine Learning. Thesis submitted in partial fulfilment of the requirements of the degree of Doctor of Philosophy at the University of Waikato.
This operator creates a filter based performance measure for a feature subset. It evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them. Subsets of features that are highly correlated with the class while having low intercorrelation are preferred.
This operator can be applied on both numerical and nominal data sets.

Input
example set: expects: ExampleSet

Output
performance:
example set:

Parameters

DDelen · October 2013

Thank you again for your time and knowledge. In sensitivity analysis, all I need is the performance of the system when one of the variables is left out. And do this for each variable. The process output only gives me a subset of those combinations. For instance, in the output that your process created, I would only be interested in the three variable combinations. That is,

If I leave a1 out and training the model on a2, a3, a4, I would get the contribution of a1
If I leave a2 out and training the model on a1, a3, a4, I would get the contribution of a2
If I leave a3 out and training the model on a1, a2, a4, I would get the contribution of a3
If I leave a4 out and training the model on a1, a2, a3, I would get the contribution of a4

The output only gives me two of these four. How can I get the other two? I think the procedure only built on the "good" subsets as it keeps including variables. Is there a way around it?

Thank you!

-Delen

wessel · October 2013

Replace the Forward search operator with Backward search.

"Backward Elimination"

From HELP:

Synopsis
This operator selects the most relevant attributes of the given ExampleSet through an efficient implementation of the backward elimination scheme.
Description
The Backward Elimination operator is a nested operator i.e. it has a subprocess. The subprocess of the Backward Elimination operator must always return a performance vector. For more information regarding subprocesses please study the Subprocess operator.
The Backward Elimination operator starts with the full set of attributes and, in each round, it removes each remaining attribute of the given ExampleSet. For each removed attribute, the performance is estimated using the inner operators, e.g. a cross-validation. Only the attribute giving the least decrease of performance is finally removed from the selection. Then a new round is started with the modified selection. This implementation avoids any additional memory consumption besides the memory used originally for storing the data and the memory which might be needed for applying the inner operators. The stopping behavior parameter specifies when the iteration should be aborted. There are three different options:
with decrease: The iteration runs as long as there is any increase in performance.
with decrease of more than: The iteration runs as long as the decrease is less than the specified threshold, either relative or absolute. The maximal relative decrease parameter is used for specifying the maximal relative decrease if the use relative decrease parameter is set to true. Otherwise, the maximal absolute decrease parameter is used for specifying the maximal absolute decrease.
with significant decrease: The iteration stops as soon as the decrease is significant to the level specified by the alpha parameter.
The speculative rounds parameter defines how many rounds will be performed in a row, after the first time the stopping criterion is fulfilled. If the performance increases again during the speculative rounds, the elimination will be continued. Otherwise all additionally eliminated attributes will be restored, as if no speculative rounds had executed. This might help avoiding getting stuck in local optima.
Feature selection i.e. the question for the most relevant features for classification or regression problems, is one of the main data mining tasks. A wide range of search methods have been integrated into RapidMiner including evolutionary algorithms. For all search methods we need a performance measurement which indicates how well a search point (a feature subset) will probably perform on the given data set.
Differentiation
Forward Selection
The Forward Selection operator starts with an empty selection of attributes and, in each round, it adds each unused attribute of the given ExampleSet. For each added attribute, the performance is estimated using the inner operators, e.g. a cross-validation. Only the attribute giving the highest increase of performance is added to the selection. Then a new round is started with the modified selection.
Input

example set (Data Table)
This input port expects an ExampleSet. This ExampleSet is available at the first port of the nested chain (inside the subprocess) for processing in the subprocess.
Output

example set (Data Table)
The feature selection algorithm is applied on the input ExampleSet. The resultant ExampleSet with reduced attributes is delivered through this port.

attribute weights (Attribute Weights)
The attribute weights are delivered through this port.

performance (Performance Vector)
This port delivers the Performance Vector for the selected attributes. A Performance Vector is a list of performance criteria values.
Parameters
maximal number of eliminations
This parameter specifies the maximal number of backward eliminations. Range: integer
speculative rounds
This parameter specifies the number of times, the stopping criterion might be consecutively ignored before the elimination is actually stopped. A number higher than one might help avoiding getting stuck in local optima. Range: integer
stopping behavior
The stopping behavior parameter specifies when the iteration should be aborted. There are three different options:
with_decrease: The iteration runs as long as there is any increase in performance.
with_decrease_of_more_than: The iteration runs as long as the decrease is less than the specified threshold, either relative or absolute. The maximal relative decrease parameter is used for specifying the maximal relative decrease if the use relative decrease parameter is set to true. Otherwise, the maximal absolute decrease parameter is used for specifying the maximal absolute decrease.
with_significant_decrease: The iteration stops as soon as the decrease is significant to the level specified by the alpha parameter.
Range: selection
use relative decrease
This parameter is only available when the stopping behavior parameter is set to 'with decrease of more than'. If the use relative decrease parameter is set to true the maximal relative decrease parameter will be used otherwise the maximal absolute decrease parameter. Range: boolean
maximal absolute decrease
This parameter is only available when the stopping behavior parameter is set to 'with decrease of more than' and the use relative decrease parameter is set to false. If the absolute performance decrease to the last step exceeds this threshold, the elimination will be stopped. Range: real
maximal relative decrease
This parameter is only available when the stopping behavior parameter is set to 'with decrease of more than' and the use relative decrease parameter is set to true. If the relative performance decrease to the last step exceeds this threshold, the elimination will be stopped. Range: real
alpha
This parameter is only available when the stopping behavior parameter is set to 'with significant decrease'. This parameter specifies the probability threshold which determines if differences are considered as significant. Range: real
Tutorial Process
Feature reduction of the Polynomial data set
The 'Polynomial' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the ExampleSet has 5 regular attributes other then the label attribute. The Backward Elimination operator is applied on the ExampleSet which is a nested operator i.e. it has a subprocess. It is necessary for the subprocess to deliver a performance vector. This performance vector is used by the underlying feature reduction algorithm. Have a look at the subprocess of this operator. The X-Validation operator is used there which itself is a nested operator. Have a look at the subprocesses of the X-Validation operator. The K-NN operator is used in the 'Training' subprocess to train a model. The trained model is applied using the Apply Model operator in the 'Testing' subprocess. The performance is measured through the Performance operator and the resultant performance vector is used by the underlying algorithm. Run the process and switch to the Results Workspace. You can see that the ExampleSet that had 5 attributes has now been reduced to 3 attributes.

DDelen · October 2013

I replaced the forwards selection with Backward Elimination in your previous process, hoping that it would do the trick. It did not. I still don't know how to do such a seemingly simple experimentation. Any idea where I am missing the point? Thank you very much.

-Delen

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Sensitivity Analysis for Predictive Models

Answers