Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"How to get the attribute relevance using SVM"
Hello,
I have a SVM Model (LibSVM because of multiclass problem) (included in X-Validation) and I want to know, which attributes are most relevant. I already did a grid search to find out the best parameter combination of gamma and c.
Can I get the attribute relevance with a RBF Kernel or do I have to use a Linear Kernel?
How can I get the information and where would i have to put an additional operator?
The process code is attached. Thanks for your help
I have a SVM Model (LibSVM because of multiclass problem) (included in X-Validation) and I want to know, which attributes are most relevant. I already did a grid search to find out the best parameter combination of gamma and c.
Can I get the attribute relevance with a RBF Kernel or do I have to use a Linear Kernel?
How can I get the information and where would i have to put an additional operator?
The process code is attached. Thanks for your help
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" width="90" x="45" y="30">
<parameter key="repository_entry" value="../../Data/Master Excelliste_Gefügebezeichnung_3 klassen"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.015" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
<parameter key="method" value="range transformation"/>
<parameter key="min" value="-1.0"/>
</operator>
<operator activated="true" class="split_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="380" y="30">
<parameter key="split_ratio" value="0.8"/>
<parameter key="sampling_type" value="stratified sampling"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.015" expanded="true" height="76" name="SVM" width="90" x="150" y="30">
<parameter key="gamma" value="0.03087"/>
<parameter key="C" value="898910.0"/>
<list key="class_weights"/>
</operator>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="248" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
What kind of relevance information are you searching for? The mean weights?
Dortmund, Germany
I want to get a table where i can see the relevance of each attribute for that specific classification (for example scled between 0 and 1).
I want to discuss which attribute (in my case the attributes represent different measure methods) have a great influece on the classification.
For a Linear SVM you can use the Polynominal to Binomal Classificator and determine the weights for each class separatly.
Furthermore there are severaloperators providing a Weight By SVM.
1. Weight by SVM in Rapidminer Core
2. W-SVMAttributeEval in WEKA
3. Select by Recursive Feature Elimination with SVM (part of feature selection extension)
But all of them are using linear SVMs.
Alternativly you can do a Forward Selection with your SVM inside. That way can produce a ranking of your attributes.
In your case it would be a additional idea to look at the "Weight by ..." operators. If you want to rank you measurement methods you might have a look at the tree importance or so.
Edit: An additional idea came to my mind. You could do n-1 machines (machines which are trained on all attributes but one) and look at the decrease of your performance value (accuracy,AUC,...). Than you can use this decrease as an feature (un)imporatance.
Dortmund, Germany
- Poly by ...
-X-Validation
-SVM
-apply model, performance
Is that right?
As a result i get 3 Weight Tables (due to 3 classes): 1 vs all; 2 vs. all and 3 vs. all.
Each containing the attributes and theirs weights. When i got it right, the weight represent the direction on the hyperplane (vector) and tells you how important a attribute is in relation to the others. Does this mean that the greater the value, positive or negative, the greater the influence?
Edit: When i use the Polynominal to Binominal Classf. Operator i can only get the model as an output, but not the performance from the x-validation (see code)
in my eyes the one vs all operator should be inside the cross validation, because i am building a "meta model" which needs to be validated.
i build a process which collects the indivudal weight vectors on Iris, this should work on your data as well. The XML is attached.
After doing so I realized two things
1. You are doing a split validation with 0.8 as percentage. That means you second SVM is trained on 20%. I would definitly swich to a X-Validation (maybe with 2 folds).
2. The weight vectors are for a linear SVM the same as the vectors in the Model. If you click on "Model description" you get the weights. This works for a radial SVM as well. That should solve your problem.
Dortmund, Germany