"How to get the attribute relevance using SVM"

nihaniha Member Posts: 4 Contributor I
edited June 9 in Help
Hello,

I have a SVM Model (LibSVM because of multiclass problem) (included in X-Validation) and I want to know, which attributes are most relevant. I already did a grid search to find out the best parameter combination of gamma and c.
Can I get the attribute relevance with a RBF Kernel or do I have to use a Linear Kernel?
How can I get the information and where would i have to put an additional operator?

The process code is attached. Thanks for your help


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" width="90" x="45" y="30">
        <parameter key="repository_entry" value="../../Data/Master Excelliste_Gefügebezeichnung_3 klassen"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="5.3.015" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
        <parameter key="method" value="range transformation"/>
        <parameter key="min" value="-1.0"/>
      </operator>
      <operator activated="true" class="split_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="380" y="30">
        <parameter key="split_ratio" value="0.8"/>
        <parameter key="sampling_type" value="stratified sampling"/>
        <process expanded="true">
          <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.015" expanded="true" height="76" name="SVM" width="90" x="150" y="30">
            <parameter key="gamma" value="0.03087"/>
            <parameter key="C" value="898910.0"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="training" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="248" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" from_port="output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="model" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Tagged:

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,078  RM Data Scientist
    LibSVM is doing internally something like One vs All for classification and afterwards it is combining the models.

    What kind of relevance information are you searching for? The mean weights?
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • nihaniha Member Posts: 4 Contributor I
    Thanks for response!

    I want to get a table where i can see the relevance of each attribute for that specific classification (for example scled between 0 and 1).
    I want to discuss which attribute (in my case the attributes represent different measure methods) have a great influece on the classification.
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,078  RM Data Scientist
    I am not aware of a way to do it with a radial SVM.

    For a Linear SVM you can use the Polynominal to Binomal Classificator and determine the weights for each class separatly.

    Furthermore there are severaloperators providing a Weight By SVM.

    1. Weight by SVM in Rapidminer Core
    2. W-SVMAttributeEval in WEKA
    3. Select by Recursive Feature Elimination with SVM (part of feature selection extension)

    But all of them are using linear SVMs.

    Alternativly you can do a Forward Selection with your SVM inside. That way can produce a ranking of your attributes.

    In your case it would be a additional idea to look at the "Weight by ..." operators. If you want to rank you measurement methods you might have a look at the tree importance or so.


    Edit: An additional idea came to my mind. You could do n-1 machines (machines which are trained on all attributes but one) and look at the decrease of your performance value (accuracy,AUC,...). Than you can use this decrease as an feature (un)imporatance.
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • nihaniha Member Posts: 4 Contributor I
    How would I combine the Polynominal to Binominal Classf. with X-Validation?
    - Poly by ...
        -X-Validation
                 -SVM
                 -apply model, performance

    Is that right?

    As a result i get 3 Weight Tables (due to 3 classes): 1 vs all; 2 vs. all and 3 vs. all.
    Each containing the attributes and theirs weights. When i got it right, the weight represent the direction on the hyperplane (vector) and tells you how important a attribute is in relation to the others. Does this mean that the greater the value, positive or negative, the greater the influence?

    Edit: When i use the  Polynominal to Binominal Classf. Operator i can only get the model as an output, but not the performance from the x-validation (see code)

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" width="90" x="45" y="30">
            <parameter key="repository_entry" value="../../Data/Master Excelliste_Gefügebezeichnung_3 klassen"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="5.3.015" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
            <parameter key="method" value="range transformation"/>
            <parameter key="min" value="-1.0"/>
          </operator>
          <operator activated="true" class="polynomial_by_binomial_classification" compatibility="5.3.015" expanded="true" height="76" name="Polynominal by Binominal Classification" width="90" x="380" y="75">
            <process expanded="true">
              <operator activated="true" class="split_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="380" y="30">
                <parameter key="split_ratio" value="0.8"/>
                <parameter key="sampling_type" value="stratified sampling"/>
                <process expanded="true">
                  <operator activated="true" class="support_vector_machine" compatibility="5.3.015" expanded="true" height="112" name="SVM (2)" width="90" x="45" y="30">
                    <parameter key="kernel_type" value="radial"/>
                    <parameter key="kernel_gamma" value="0.03"/>
                    <parameter key="C" value="380000.0"/>
                  </operator>
                  <connect from_port="training" to_op="SVM (2)" to_port="training set"/>
                  <connect from_op="SVM (2)" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="248" y="30"/>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="training set" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Polynominal by Binominal Classification" to_port="training set"/>
          <connect from_op="Polynominal by Binominal Classification" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
                 
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,078  RM Data Scientist
    Hi again,

    in my eyes the one vs all operator should be inside the cross validation, because i am building a "meta model" which needs to be validated.

    i build a process which collects the indivudal weight vectors on Iris, this should work on your data as well. The XML is attached.

    After doing so I realized two things

    1. You are doing a split validation with 0.8 as percentage. That means you second SVM is trained on 20%. I would definitly swich to a X-Validation (maybe with 2 folds).

    2. The weight vectors are for a linear SVM the same as the vectors in the Model. If you click on "Model description" you get the weights. This works for a radial SVM as well. That should solve your problem.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.1.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Generate Empty Weight vector">
       <process expanded="true">
         <operator activated="true" class="subprocess" compatibility="6.1.000" expanded="true" height="76" name="Subprocess" width="90" x="45" y="300">
           <process expanded="true">
             <operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
               <parameter key="number_examples" value="1"/>
               <parameter key="number_of_attributes" value="1"/>
             </operator>
             <operator activated="true" class="select_attributes" compatibility="6.1.000" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="30">
               <parameter key="invert_selection" value="true"/>
             </operator>
             <operator activated="true" class="weight_by_user_specification" compatibility="6.1.000" expanded="true" height="76" name="Weight by User Specification" width="90" x="380" y="30">
               <list key="name_regex_to_weights"/>
             </operator>
             <operator activated="true" class="collect" compatibility="6.1.000" expanded="true" height="76" name="Collect (2)" width="90" x="514" y="30"/>
             <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
             <connect from_op="Select Attributes" from_port="example set output" to_op="Weight by User Specification" to_port="example set"/>
             <connect from_op="Weight by User Specification" from_port="weights" to_op="Collect (2)" to_port="input 1"/>
             <connect from_op="Collect (2)" from_port="collection" to_port="out 1"/>
             <portSpacing port="source_in 1" spacing="0"/>
             <portSpacing port="sink_out 1" spacing="0"/>
             <portSpacing port="sink_out 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember" width="90" x="179" y="300">
           <parameter key="name" value="Weights"/>
           <parameter key="io_object" value="IOObjectCollection"/>
         </operator>
         <operator activated="false" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Master Excelliste_Gefügebezeichnung_3 klassen" width="90" x="45" y="120">
           <parameter key="repository_entry" value="../../Data/Master Excelliste_Gefügebezeichnung_3 klassen"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
           <parameter key="repository_entry" value="//Samples/data/Iris"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="6.1.000" expanded="true" height="94" name="Normalize" width="90" x="246" y="30">
           <parameter key="method" value="range transformation"/>
           <parameter key="min" value="-1.0"/>
         </operator>
         <operator activated="true" class="split_validation" compatibility="6.1.000" expanded="true" height="112" name="Validation" width="90" x="581" y="30">
           <parameter key="split_ratio" value="0.8"/>
           <parameter key="sampling_type" value="stratified sampling"/>
           <process expanded="true">
             <operator activated="true" class="polynomial_by_binomial_classification" compatibility="6.1.000" expanded="true" height="76" name="Polynominal by Binominal Classification" width="90" x="112" y="30">
               <process expanded="true">
                 <operator activated="true" class="support_vector_machine" compatibility="6.1.000" expanded="true" height="112" name="SVM (2)" width="90" x="246" y="30">
                   <parameter key="kernel_gamma" value="0.03"/>
                   <parameter key="C" value="380000.0"/>
                 </operator>
                 <operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall" width="90" x="112" y="255">
                   <parameter key="name" value="Weights"/>
                   <parameter key="io_object" value="IOObjectCollection"/>
                 </operator>
                 <operator activated="true" class="flatten_collection" compatibility="6.1.000" expanded="true" height="60" name="Flatten Collection" width="90" x="246" y="255"/>
                 <operator activated="true" class="collect" compatibility="6.1.000" expanded="true" height="94" name="Collect" width="90" x="447" y="165"/>
                 <operator activated="true" class="remember" compatibility="6.1.000" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="165">
                   <parameter key="name" value="Weights"/>
                   <parameter key="io_object" value="IOObjectCollection"/>
                 </operator>
                 <connect from_port="training set" to_op="SVM (2)" to_port="training set"/>
                 <connect from_op="SVM (2)" from_port="model" to_port="model"/>
                 <connect from_op="SVM (2)" from_port="weights" to_op="Collect" to_port="input 2"/>
                 <connect from_op="Recall" from_port="result" to_op="Flatten Collection" to_port="collection"/>
                 <connect from_op="Flatten Collection" from_port="flat" to_op="Collect" to_port="input 1"/>
                 <connect from_op="Collect" from_port="collection" to_op="Remember (2)" to_port="store"/>
                 <portSpacing port="source_training set" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
               </process>
             </operator>
             <connect from_port="training" to_op="Polynominal by Binominal Classification" to_port="training set"/>
             <connect from_op="Polynominal by Binominal Classification" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true">
             <operator activated="true" class="apply_model" compatibility="6.1.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="performance" compatibility="6.1.000" expanded="true" height="76" name="Performance" width="90" x="248" y="30"/>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="recall" compatibility="6.1.000" expanded="true" height="60" name="Recall (2)" width="90" x="715" y="165">
           <parameter key="name" value="Weights"/>
           <parameter key="io_object" value="IOObjectCollection"/>
         </operator>
         <operator activated="true" class="flatten_collection" compatibility="6.1.000" expanded="true" height="60" name="Flatten Collection (2)" width="90" x="849" y="165"/>
         <connect from_op="Subprocess" from_port="out 1" to_op="Remember" to_port="store"/>
         <connect from_op="Retrieve Iris" from_port="output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Validation" to_port="training"/>
         <connect from_op="Validation" from_port="model" to_port="result 1"/>
         <connect from_op="Recall (2)" from_port="result" to_op="Flatten Collection (2)" to_port="collection"/>
         <connect from_op="Flatten Collection (2)" from_port="flat" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.