[Solved] SVM Support Vector Machine - Performance, Weights and Paramters

qwertzqwertz Member Posts: 130  Maven
edited October 12 in Help
Dear all,

after struggling with neural nets (see post http://rapid-i.com/rapidforum/index.php/topic,5356.0.html) I decided to follow the advice to try SVMs.
Therefore, I created a sample process with test data.


My observations:
1) The parameter set output show a prediction trend accuracy of 0.487 while the log shows 0.579 as best performance. This is surprising to me as I thought that the optimizer would provide the best combination.
2) It is written that SVM operator can handle weights. It might be a stupid question but where can I connect the weights that are coming from e.g. the correlation matrix?

By the way:
- Is there any kind of further documentation which helps to understand the SVM's parameters?
- Neural net offers internal normalization. Is that also true for SVMs or do I need a separate operator for this?



Thank you very much for sharing your ideas...
Sachs



Sample set: http://datahost.bplaced.net/sample3.xls

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
   <process expanded="true" height="469" width="701">
     <operator activated="true" class="read_excel" compatibility="5.2.003" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
       <parameter key="excel_file" value="C:\sample3.xls"/>
       <parameter key="imported_cell_range" value="A1:AM74"/>
       <parameter key="first_row_as_names" value="false"/>
       <list key="annotations">
         <parameter key="0" value="Name"/>
       </list>
       <parameter key="date_format" value="dd.MM.yyyy"/>
       <list key="data_set_meta_data_information">
         <parameter key="0" value="id.true.integer.id"/>
         <parameter key="1" value="a.true.real.attribute"/>
         <parameter key="2" value="b.true.real.attribute"/>
         <parameter key="3" value="c.true.real.attribute"/>
         <parameter key="4" value="d.true.real.attribute"/>
         <parameter key="5" value="e.true.real.attribute"/>
         <parameter key="6" value="f.true.real.attribute"/>
         <parameter key="7" value="g.true.real.attribute"/>
         <parameter key="8" value="h.true.real.attribute"/>
         <parameter key="9" value="i.true.real.attribute"/>
         <parameter key="10" value="label.true.real.attribute"/>
         <parameter key="11" value="j.true.real.attribute"/>
         <parameter key="12" value="k.true.real.attribute"/>
         <parameter key="13" value="l.true.real.attribute"/>
         <parameter key="14" value="m.true.real.attribute"/>
         <parameter key="15" value="n.true.real.attribute"/>
         <parameter key="16" value="o.true.real.attribute"/>
         <parameter key="17" value="p.true.real.attribute"/>
         <parameter key="18" value="q.true.real.attribute"/>
         <parameter key="19" value="r.true.real.attribute"/>
         <parameter key="20" value="s.true.real.attribute"/>
         <parameter key="21" value="t.true.real.attribute"/>
         <parameter key="22" value="u.true.real.attribute"/>
         <parameter key="23" value="v.true.real.attribute"/>
         <parameter key="24" value="w.true.real.attribute"/>
         <parameter key="25" value="x.true.real.attribute"/>
         <parameter key="26" value="y.true.real.attribute"/>
         <parameter key="27" value="z.true.real.attribute"/>
         <parameter key="28" value="aa.true.real.attribute"/>
         <parameter key="29" value="ab.true.real.attribute"/>
         <parameter key="30" value="ac.true.real.attribute"/>
         <parameter key="31" value="ad.true.real.attribute"/>
         <parameter key="32" value="ae.true.real.attribute"/>
         <parameter key="33" value="af.true.real.attribute"/>
         <parameter key="34" value="ag.true.real.attribute"/>
         <parameter key="35" value="ah.true.real.attribute"/>
         <parameter key="36" value="ai.true.real.attribute"/>
         <parameter key="37" value="aj.true.real.attribute"/>
         <parameter key="38" value="ak.true.real.attribute"/>
       </list>
     </operator>
     <operator activated="true" class="multiply" compatibility="5.2.003" expanded="true" height="112" name="Multiply" width="90" x="179" y="120"/>
     <operator activated="true" class="correlation_matrix" compatibility="5.2.003" expanded="true" height="94" name="Correlation Matrix" width="90" x="313" y="345"/>
     <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing" width="90" x="313" y="30">
       <parameter key="horizon" value="1"/>
       <parameter key="window_size" value="1"/>
       <parameter key="create_label" value="true"/>
       <parameter key="label_attribute" value="label"/>
     </operator>
     <operator activated="true" class="optimize_parameters_grid" compatibility="5.2.003" expanded="true" height="112" name="Optimize Parameters (Grid)" width="90" x="447" y="30">
       <list key="parameters">
         <parameter key="SVM (Linear).C" value="[0.0001;250;5;logarithmic]"/>
         <parameter key="SVM (Linear).convergence_epsilon" value="[0.001;0.1;5;logarithmic]"/>
       </list>
       <process expanded="true" height="446" width="622">
         <operator activated="true" class="series:sliding_window_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
           <parameter key="training_window_width" value="20"/>
           <parameter key="training_window_step_size" value="10"/>
           <parameter key="test_window_width" value="20"/>
           <process expanded="true" height="447" width="232">
             <operator activated="true" class="support_vector_machine_linear" compatibility="5.2.003" expanded="true" height="76" name="SVM (Linear)" width="90" x="112" y="30">
               <parameter key="C" value="250.0"/>
               <parameter key="convergence_epsilon" value="0.1"/>
             </operator>
             <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
             <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true" height="447" width="299">
             <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="series:forecasting_performance" compatibility="5.1.002" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
               <parameter key="horizon" value="1"/>
             </operator>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="log" compatibility="5.2.003" expanded="true" height="76" name="Log" width="90" x="380" y="120">
           <list key="log">
             <parameter key="c" value="operator.SVM (Linear).parameter.C"/>
             <parameter key="epsilon" value="operator.SVM (Linear).parameter.convergence_epsilon"/>
             <parameter key="prediction_trend_accuracy" value="operator.Performance.value.prediction_trend_accuracy"/>
           </list>
         </operator>
         <connect from_port="input 1" to_op="Validation" to_port="training"/>
         <connect from_op="Validation" from_port="model" to_port="result 1"/>
         <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
         <connect from_op="Log" from_port="through 1" to_port="performance"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="source_input 2" spacing="0"/>
         <portSpacing port="sink_performance" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing (2)" width="90" x="313" y="210">
       <parameter key="window_size" value="1"/>
       <parameter key="create_label" value="true"/>
       <parameter key="label_attribute" value="label"/>
     </operator>
     <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model (2)" width="90" x="581" y="210">
       <list key="application_parameters"/>
     </operator>
     <connect from_op="Read Excel" from_port="output" to_op="Multiply" to_port="input"/>
     <connect from_op="Multiply" from_port="output 1" to_op="Windowing" to_port="example set input"/>
     <connect from_op="Multiply" from_port="output 2" to_op="Windowing (2)" to_port="example set input"/>
     <connect from_op="Multiply" from_port="output 3" to_op="Correlation Matrix" to_port="example set"/>
     <connect from_op="Windowing" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
     <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 1"/>
     <connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_op="Apply Model (2)" to_port="model"/>
     <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
     <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="18"/>
     <portSpacing port="sink_result 2" spacing="144"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • Nils_WoehlerNils_Woehler Member Posts: 463  Guru
    Hi,

    1) You are logging the wrong performance. The performance you are logging is the validation's last iteration performance. Set the last logging parameter to Validation Performance and you will get the correct results.

    2) as far as I know the SVM can't handle weights

    If you want to get more information a good start is to look on Wikipedia (http://en.wikipedia.org/wiki/Support_vector_machine) or search the web for other articles regarding SVMs.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
       <process expanded="true" height="469" width="701">
         <operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
           <parameter key="imported_cell_range" value="A1:AM74"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <parameter key="date_format" value="dd.MM.yyyy"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="id.true.integer.id"/>
             <parameter key="1" value="a.true.real.attribute"/>
             <parameter key="2" value="b.true.real.attribute"/>
             <parameter key="3" value="c.true.real.attribute"/>
             <parameter key="4" value="d.true.real.attribute"/>
             <parameter key="5" value="e.true.real.attribute"/>
             <parameter key="6" value="f.true.real.attribute"/>
             <parameter key="7" value="g.true.real.attribute"/>
             <parameter key="8" value="h.true.real.attribute"/>
             <parameter key="9" value="i.true.real.attribute"/>
             <parameter key="10" value="label.true.real.attribute"/>
             <parameter key="11" value="j.true.real.attribute"/>
             <parameter key="12" value="k.true.real.attribute"/>
             <parameter key="13" value="l.true.real.attribute"/>
             <parameter key="14" value="m.true.real.attribute"/>
             <parameter key="15" value="n.true.real.attribute"/>
             <parameter key="16" value="o.true.real.attribute"/>
             <parameter key="17" value="p.true.real.attribute"/>
             <parameter key="18" value="q.true.real.attribute"/>
             <parameter key="19" value="r.true.real.attribute"/>
             <parameter key="20" value="s.true.real.attribute"/>
             <parameter key="21" value="t.true.real.attribute"/>
             <parameter key="22" value="u.true.real.attribute"/>
             <parameter key="23" value="v.true.real.attribute"/>
             <parameter key="24" value="w.true.real.attribute"/>
             <parameter key="25" value="x.true.real.attribute"/>
             <parameter key="26" value="y.true.real.attribute"/>
             <parameter key="27" value="z.true.real.attribute"/>
             <parameter key="28" value="aa.true.real.attribute"/>
             <parameter key="29" value="ab.true.real.attribute"/>
             <parameter key="30" value="ac.true.real.attribute"/>
             <parameter key="31" value="ad.true.real.attribute"/>
             <parameter key="32" value="ae.true.real.attribute"/>
             <parameter key="33" value="af.true.real.attribute"/>
             <parameter key="34" value="ag.true.real.attribute"/>
             <parameter key="35" value="ah.true.real.attribute"/>
             <parameter key="36" value="ai.true.real.attribute"/>
             <parameter key="37" value="aj.true.real.attribute"/>
             <parameter key="38" value="ak.true.real.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="112" name="Multiply" width="90" x="179" y="120"/>
         <operator activated="true" class="correlation_matrix" compatibility="5.2.008" expanded="true" height="94" name="Correlation Matrix" width="90" x="313" y="345"/>
         <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="313" y="30">
           <parameter key="horizon" value="1"/>
           <parameter key="window_size" value="1"/>
           <parameter key="create_label" value="true"/>
           <parameter key="label_attribute" value="label"/>
         </operator>
         <operator activated="true" class="optimize_parameters_grid" compatibility="5.2.008" expanded="true" height="112" name="Optimize Parameters (Grid)" width="90" x="447" y="30">
           <list key="parameters">
             <parameter key="SVM (Linear).C" value="[0.0001;250;5;logarithmic]"/>
             <parameter key="SVM (Linear).convergence_epsilon" value="[0.001;0.1;5;logarithmic]"/>
           </list>
           <process expanded="true" height="446" width="622">
             <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.000" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
               <parameter key="training_window_width" value="20"/>
               <parameter key="training_window_step_size" value="10"/>
               <parameter key="test_window_width" value="20"/>
               <process expanded="true" height="447" width="232">
                 <operator activated="true" class="support_vector_machine_linear" compatibility="5.2.008" expanded="true" height="76" name="SVM (Linear)" width="90" x="112" y="30">
                   <parameter key="C" value="250.0"/>
                   <parameter key="convergence_epsilon" value="0.1"/>
                 </operator>
                 <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
                 <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true" height="447" width="299">
                 <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                   <list key="application_parameters"/>
                 </operator>
                 <operator activated="true" class="series:forecasting_performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                   <parameter key="horizon" value="1"/>
                 </operator>
                 <connect from_port="model" to_op="Apply Model" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                 <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                 <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="380" y="120">
               <list key="log">
                 <parameter key="c" value="operator.SVM (Linear).parameter.C"/>
                 <parameter key="epsilon" value="operator.SVM (Linear).parameter.convergence_epsilon"/>
                 <parameter key="prediction_trend_accuracy" value="operator.Validation.value.performance"/>
               </list>
             </operator>
             <connect from_port="input 1" to_op="Validation" to_port="training"/>
             <connect from_op="Validation" from_port="model" to_port="result 1"/>
             <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
             <connect from_op="Log" from_port="through 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing (2)" width="90" x="313" y="210">
           <parameter key="window_size" value="1"/>
           <parameter key="create_label" value="true"/>
           <parameter key="label_attribute" value="label"/>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="581" y="210">
           <list key="application_parameters"/>
         </operator>
         <connect from_op="Read Excel" from_port="output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Windowing" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Windowing (2)" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 3" to_op="Correlation Matrix" to_port="example set"/>
         <connect from_op="Windowing" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
         <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 1"/>
         <connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_op="Apply Model (2)" to_port="model"/>
         <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
         <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="18"/>
         <portSpacing port="sink_result 2" spacing="144"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    Best,
    Nils
  • qwertzqwertz Member Posts: 130  Maven


    Ahhh, that's it. The nested performance within the validation operator gets overwritten with each iteration... of course.

    I thought that SVM can handle weighted examples because when selecting the operator and pressing F1 this is shown under capabilities...


    Thank you Nils!
  • qwertzqwertz Member Posts: 130  Maven


    Coming back on the question whether to normalize the data before feeding it into the SVM:

    Being inspired by this article http://rapid-i.com/rapidforum/index.php/topic,83.0.html I ran the process with normalization the first time and without the second time.

    The result revealed that it makes a difference whether normalization is being used. In my process normalization lead to a deterioration in performance. However, I assume that this could be due to the reason that the SVM's parameters are not optimized according the normalized data or Z-transformation is possibly not the right method.

    As I am still a beginner I am not sure about that but at least it is obvious now that normalization has an influence.


    All the best
    Sachs


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="449" width="1016">
          <operator activated="true" class="read_excel" compatibility="5.2.003" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
            <parameter key="excel_file" value="C:\sample3.xls"/>
            <parameter key="imported_cell_range" value="A1:AM74"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="date_format" value="dd.MM.yyyy"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="id.true.integer.id"/>
              <parameter key="1" value="a.true.real.attribute"/>
              <parameter key="2" value="b.true.real.attribute"/>
              <parameter key="3" value="c.true.real.attribute"/>
              <parameter key="4" value="d.true.real.attribute"/>
              <parameter key="5" value="e.true.real.attribute"/>
              <parameter key="6" value="f.true.real.attribute"/>
              <parameter key="7" value="g.true.real.attribute"/>
              <parameter key="8" value="h.true.real.attribute"/>
              <parameter key="9" value="i.true.real.attribute"/>
              <parameter key="10" value="label.true.real.attribute"/>
              <parameter key="11" value="j.true.real.attribute"/>
              <parameter key="12" value="k.true.real.attribute"/>
              <parameter key="13" value="l.true.real.attribute"/>
              <parameter key="14" value="m.true.real.attribute"/>
              <parameter key="15" value="n.true.real.attribute"/>
              <parameter key="16" value="o.true.real.attribute"/>
              <parameter key="17" value="p.true.real.attribute"/>
              <parameter key="18" value="q.true.real.attribute"/>
              <parameter key="19" value="r.true.real.attribute"/>
              <parameter key="20" value="s.true.real.attribute"/>
              <parameter key="21" value="t.true.real.attribute"/>
              <parameter key="22" value="u.true.real.attribute"/>
              <parameter key="23" value="v.true.real.attribute"/>
              <parameter key="24" value="w.true.real.attribute"/>
              <parameter key="25" value="x.true.real.attribute"/>
              <parameter key="26" value="y.true.real.attribute"/>
              <parameter key="27" value="z.true.real.attribute"/>
              <parameter key="28" value="aa.true.real.attribute"/>
              <parameter key="29" value="ab.true.real.attribute"/>
              <parameter key="30" value="ac.true.real.attribute"/>
              <parameter key="31" value="ad.true.real.attribute"/>
              <parameter key="32" value="ae.true.real.attribute"/>
              <parameter key="33" value="af.true.real.attribute"/>
              <parameter key="34" value="ag.true.real.attribute"/>
              <parameter key="35" value="ah.true.real.attribute"/>
              <parameter key="36" value="ai.true.real.attribute"/>
              <parameter key="37" value="aj.true.real.attribute"/>
              <parameter key="38" value="ak.true.real.attribute"/>
            </list>
          </operator>
          <operator activated="false" class="normalize" compatibility="5.2.003" expanded="true" height="94" name="Normalize" width="90" x="179" y="165"/>
          <operator activated="true" class="multiply" compatibility="5.2.003" expanded="true" height="94" name="Multiply" width="90" x="313" y="120"/>
          <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing" width="90" x="447" y="30">
            <parameter key="horizon" value="1"/>
            <parameter key="window_size" value="1"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="label"/>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing (2)" width="90" x="447" y="210">
            <parameter key="window_size" value="1"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="label"/>
          </operator>
          <operator activated="true" class="series:sliding_window_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="581" y="30">
            <parameter key="training_window_width" value="20"/>
            <parameter key="training_window_step_size" value="10"/>
            <parameter key="test_window_width" value="20"/>
            <process expanded="true" height="447" width="295">
              <operator activated="true" class="support_vector_machine_linear" compatibility="5.2.003" expanded="true" height="76" name="SVM (Linear)" width="90" x="102" y="30">
                <parameter key="C" value="250.0"/>
                <parameter key="convergence_epsilon" value="0.1"/>
              </operator>
              <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
              <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="447" width="295">
              <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="series:forecasting_performance" compatibility="5.1.002" expanded="true" height="76" name="Performance" width="90" x="170" y="30">
                <parameter key="horizon" value="1"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model (2)" width="90" x="715" y="210">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="5.2.003" expanded="true" height="76" name="Performance (4)" width="90" x="849" y="210">
            <parameter key="root_mean_squared_error" value="false"/>
            <parameter key="relative_error" value="true"/>
            <parameter key="correlation" value="true"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Windowing" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Windowing (2)" to_port="example set input"/>
          <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (4)" to_port="labelled data"/>
          <connect from_op="Performance (4)" from_port="performance" to_port="result 1"/>
          <connect from_op="Performance (4)" from_port="example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="180"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • qwertzqwertz Member Posts: 130  Maven

    As it seems that SVMs don't have an input for weights (though written in the documentation *still puzzled*) a workaround could be to use the "scale by weights" operator first before feeding the data into the SVM.
  • Nils_WoehlerNils_Woehler Member Posts: 463  Guru
    Hi,

    SVM supports weights but not with a 'weights' port as most operators do but with an attribute that has the role 'weight'.

    Here is an example

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.009" expanded="true" name="Process">
        <process expanded="true" height="422" width="681">
          <operator activated="true" class="retrieve" compatibility="5.2.009" expanded="true" height="60" name="Retrieve" width="90" x="112" y="30">
            <parameter key="repository_entry" value="//Samples/data/Weighting"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="5.2.009" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
            <process expanded="true" height="509" width="455">
              <operator activated="true" class="normalize" compatibility="5.2.009" expanded="true" height="94" name="Normalize" width="90" x="45" y="30"/>
              <operator activated="true" class="support_vector_machine" compatibility="5.2.009" expanded="true" height="112" name="SVM" width="90" x="179" y="30"/>
              <connect from_port="training" to_op="Normalize" to_port="example set input"/>
              <connect from_op="Normalize" from_port="example set output" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <connect from_op="SVM" from_port="weights" to_port="through 1"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
              <portSpacing port="sink_through 2" spacing="0"/>
            </process>
            <process expanded="true" height="509" width="455">
              <operator activated="true" class="normalize" compatibility="5.2.009" expanded="true" height="94" name="Normalize (2)" width="90" x="45" y="75"/>
              <operator activated="true" class="apply_model" compatibility="5.2.009" expanded="true" height="76" name="Apply Model" width="90" x="179" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.009" expanded="true" height="76" name="Performance" width="90" x="313" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Normalize (2)" to_port="example set input"/>
              <connect from_op="Normalize (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="source_through 2" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.2.009" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="165">
            <parameter key="repository_entry" value="//Samples/data/Weighting"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.2.009" expanded="true" height="76" name="Rename" width="90" x="179" y="165">
            <parameter key="old_name" value="weighting.dat (7)"/>
            <parameter key="new_name" value="label"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.009" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="165">
            <list key="function_descriptions">
              <parameter key="weight" value="if(equals(label,&quot;positive&quot;),10,1)"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.2.009" expanded="true" height="76" name="Set Role" width="90" x="447" y="165">
            <parameter key="name" value="weight"/>
            <parameter key="target_role" value="weight"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="5.2.009" expanded="true" height="112" name="Validation (2)" width="90" x="581" y="165">
            <process expanded="true" height="509" width="455">
              <operator activated="true" class="normalize" compatibility="5.2.009" expanded="true" height="94" name="Normalize (3)" width="90" x="45" y="30"/>
              <operator activated="true" class="support_vector_machine" compatibility="5.2.009" expanded="true" height="112" name="SVM (2)" width="90" x="250" y="30"/>
              <connect from_port="training" to_op="Normalize (3)" to_port="example set input"/>
              <connect from_op="Normalize (3)" from_port="example set output" to_op="SVM (2)" to_port="training set"/>
              <connect from_op="SVM (2)" from_port="model" to_port="model"/>
              <connect from_op="SVM (2)" from_port="weights" to_port="through 1"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
              <portSpacing port="sink_through 2" spacing="0"/>
            </process>
            <process expanded="true" height="509" width="455">
              <operator activated="true" class="normalize" compatibility="5.2.009" expanded="true" height="94" name="Normalize (4)" width="90" x="45" y="30"/>
              <operator activated="true" class="apply_model" compatibility="5.2.009" expanded="true" height="76" name="Apply Model (2)" width="90" x="180" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.009" expanded="true" height="76" name="Performance (2)" width="90" x="317" y="30"/>
              <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_port="test set" to_op="Normalize (4)" to_port="example set input"/>
              <connect from_op="Normalize (4)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="source_through 2" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Validation (2)" to_port="training"/>
          <connect from_op="Validation (2)" from_port="averagable 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Best,
    Nils
  • qwertzqwertz Member Posts: 130  Maven


    Ah, thank you Nils. That is meant by SVM can handle weights. Now I got it.


    What attracts my attention:
    In your sample code there is a separate attribute with weights - one weight for each example.
    When using the weight port I assign a weight for each attribute.

    Maybe obvious to the professional but surprising to me :)



    By the way: I wonder why you put a normalization into both training and testing within validation. Wouldn't this have an effect on the result? Imagine you have data like 1,1,1,5,9,7 and validation splits it into 1,1,1 for training and 5,9,7 for testing. Therefore, it would be necessary to do normalization prior to validation I assume.
    Is there any special reason why you do it like that?

    PS: I'am still not sure whether z-transformation is a good way to normalize for SVMs. I read a couple of articles but yet I don't have a solution on that. When comparing time series like stocks I think range transformation (0 to 1) or proportional transformation could be better as z-transformation has an influence on the relative distance of two values.


    Have a nice day
    Sachs
Sign In or Register to comment.