Options

"grid search met 'root_mean_squared_error: unknown'"

beijinghe2008beijinghe2008 Member Posts: 13 Contributor II
edited June 2019 in Help
Hi,all

In order to get the optimized value of SVM.c and SVM.gamma, I used the 'parameter optimized (grid)' to get the expected values of SVM operator. However, I often got the NaN data as below:

Performance:
性能向量 [
-----root_mean_squared_error: unknown
-----squared_error: unknown
]
SVM.gamma = NaN
SVM.C = NaN


Who know what's the root cause?

Below are the process file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <parameter key="logverbose" value="all"/>
    <process expanded="true" height="521" width="859">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
        <parameter key="repository_entry" value="//test/data/LRO1"/>
      </operator>
      <operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="45" y="165">
        <parameter key="old_name" value="试油结论"/>
        <parameter key="new_name" value="syjl"/>
      </operator>
      <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="179" y="165">
        <parameter key="default" value="minimum"/>
        <list key="columns">
          <parameter key="LLD" value="average"/>
        </list>
      </operator>
      <operator activated="true" class="weight_by_relief" expanded="true" height="76" name="Weight by Relief" width="90" x="313" y="165"/>
      <operator activated="true" class="select_by_weights" expanded="true" height="94" name="Select by Weights" width="90" x="447" y="165">
        <parameter key="weight" value="0.6"/>
      </operator>
      <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="parameter optimized (grid)" width="90" x="581" y="165">
        <list key="parameters">
          <parameter key="SVM.gamma" value="[0.0;Infinity;10;linear]"/>
          <parameter key="SVM.C" value="[0.0;Infinity;10;linear]"/>
        </list>
        <process expanded="true" height="521" width="859">
          <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation (2)" width="90" x="112" y="75">
            <process expanded="true" height="521" width="404">
              <operator activated="true" class="discretize_by_size" expanded="true" height="94" name="Discretize" width="90" x="44" y="30">
                <parameter key="attribute_filter_type" value="regular_expression"/>
                <parameter key="regular_expression" value="syjl"/>
                <parameter key="include_special_attributes" value="true"/>
                <parameter key="size_of_bins" value="4"/>
              </operator>
              <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="179" y="30">
                <parameter key="gamma" value="Infinity"/>
                <parameter key="C" value="Infinity"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="training" to_op="Discretize" to_port="example set input"/>
              <connect from_op="Discretize" from_port="example set output" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="521" width="404">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="applymodle (2)" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="extract_performance" expanded="true" height="76" name="Performance (3)" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="applymodle (2)" to_port="model"/>
              <connect from_port="test set" to_op="applymodle (2)" to_port="unlabelled data"/>
              <connect from_op="applymodle (2)" from_port="labelled data" to_op="Performance (3)" to_port="example set"/>
              <connect from_op="Performance (3)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_port="input 1" to_op="Validation (2)" to_port="training"/>
          <connect from_op="Validation (2)" from_port="model" to_port="result 1"/>
          <connect from_op="Validation (2)" from_port="averagable 1" to_port="performance"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="write_parameters" expanded="true" height="60" name="Write Parameters" width="90" x="581" y="345">
        <parameter key="parameter_file" value="D:\workspace_dataminer\testRepos\svmParameters.par"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Weight by Relief" to_port="example set"/>
      <connect from_op="Weight by Relief" from_port="weights" to_op="Select by Weights" to_port="weights"/>
      <connect from_op="Weight by Relief" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
      <connect from_op="Select by Weights" from_port="example set output" to_op="parameter optimized (grid)" to_port="input 1"/>
      <connect from_op="parameter optimized (grid)" from_port="parameter" to_op="Write Parameters" to_port="input"/>
      <connect from_op="Write Parameters" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>


Thanks in advance!

Regards,
Xu He
Tagged:

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    Without the data it is difficult to be sure, but I think you should at least put the "discretize" operator somewhere else. As it is, numerical attributes are converted to nominal in the training examples, but not in the test examples of the validation, which is pretty radical. Try putting it before the Grid optimisation?


    Good luck!

    PS. I'm not sure why you've used the "extract_performance" operator either.
  • Options
    beijinghe2008beijinghe2008 Member Posts: 13 Contributor II
    haddock,

    First, thanks your quick answer! 
    I solved the problem by set initialized values for C an gamma.  I wonder why I can not use the default value?

    Now I have below two problems:
    1.  My target is to use grid search to get the optimized parameters and then use them in final SVM operator.  Currently, I can set the C an gamma to a parameter file, and can use readParameter to retrieve them, but set them to the final SVM?    I plan to use macro , but how to read parameters from a file and set them into macros?

    If can not do it automatically, I have to do it with two separate steps manually :-(

    2. As you mentioned, I can use Extract Performance. Would you please let me know how to use it?  Can you give me a sample? Thanks in advance.

    Regards,
    Xu He
  • Options
    beijinghe2008beijinghe2008 Member Posts: 13 Contributor II
    Sorry. I made a mistake. It can work only if svm_type is set to nu-SVC rather than C-SVC.  Once I set it to C-SVC, Below error message reported in log file:

    Apr 6, 2010 8:10:18 PM INFO: Kernel Model: The learned model does not support parameter

    I simplified my process (some operators are disabled: Optimize Parameters (Evolutionary): disabled,  Optimize Parameters (Grid)  can work but not expected, so disabled, ).

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
         <location/>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <parameter key="logverbosity" value="error"/>
       <parameter key="encoding" value="GB2312"/>
       <process expanded="true" height="557" width="815">
         <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
           <parameter key="repository_entry" value="//logWell/LRO"/>
         </operator>
         <operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="45" y="165">
           <parameter key="old_name" value="LLD"/>
           <parameter key="new_name" value="电阻率"/>
         </operator>
         <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="179" y="165">
           <parameter key="default" value="minimum"/>
           <list key="columns">
             <parameter key="LLD" value="average"/>
           </list>
         </operator>
         <operator activated="true" class="weight_by_relief" expanded="true" height="76" name="Weight by Relief" width="90" x="313" y="165"/>
         <operator activated="true" class="select_by_weights" expanded="true" height="94" name="Select by Weights" width="90" x="447" y="165">
           <parameter key="weight" value="0.6"/>
         </operator>
         <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimize Parameters (2)" width="90" x="447" y="300">
           <list key="parameters">
             <parameter key="SVM (2).gamma" value="[1.0E-6;8;10;linear]"/>
             <parameter key="SVM (2).C" value="[0.01;32768.00000;10;linear]"/>
           </list>
           <process expanded="true" height="417" width="734">
             <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation (3)" width="90" x="331" y="39">
               <process expanded="true" height="435" width="351">
                 <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM (4)" width="90" x="130" y="30">
                   <parameter key="gamma" value="8.0"/>
                   <parameter key="C" value="32768.0"/>
                   <list key="class_weights"/>
                   <parameter key="calculate_confidences" value="true"/>
                 </operator>
                 <connect from_port="training" to_op="SVM (4)" to_port="training set"/>
                 <connect from_op="SVM (4)" from_port="model" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true" height="435" width="351">
                 <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (3)" width="90" x="45" y="30">
                   <list key="application_parameters"/>
                   <parameter key="create_view" value="true"/>
                 </operator>
                 <operator activated="true" class="performance" expanded="true" height="76" name="Performance (3)" width="90" x="198" y="30"/>
                 <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
                 <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
                 <connect from_op="Performance (3)" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <connect from_port="input 1" to_op="Validation (3)" to_port="training"/>
             <connect from_op="Validation (3)" from_port="model" to_port="result 1"/>
             <connect from_op="Validation (3)" from_port="averagable 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="false" class="parallel:optimize_parameters_evolutionary_parallel" expanded="true" height="112" name="Optimize Parameters (Evolutionary)" width="90" x="45" y="300">
           <list key="parameters">
             <parameter key="SVM (3).gamma" value="[0.0;Infinity]"/>
             <parameter key="SVM (3).C" value="[0.0;Infinity]"/>
           </list>
           <process expanded="true" height="417" width="734">
             <operator activated="false" class="x_validation" expanded="true" height="112" name="Validation (2)" width="90" x="112" y="75">
               <process expanded="true" height="435" width="351">
                 <operator activated="false" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM (3)" width="90" x="130" y="30">
                   <parameter key="gamma" value="Infinity"/>
                   <parameter key="C" value="Infinity"/>
                   <list key="class_weights"/>
                 </operator>
                 <connect from_port="training" to_op="SVM (3)" to_port="training set"/>
                 <connect from_op="SVM (3)" from_port="model" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true" height="435" width="351">
                 <operator activated="false" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
                   <list key="application_parameters"/>
                   <parameter key="create_view" value="true"/>
                 </operator>
                 <operator activated="false" class="performance" expanded="true" height="76" name="Performance (2)" width="90" x="198" y="30"/>
                 <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
                 <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
                 <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <connect from_port="input 1" to_op="Validation (2)" to_port="training"/>
             <connect from_op="Validation (2)" from_port="model" to_port="result 1"/>
             <connect from_op="Validation (2)" from_port="averagable 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="false" class="optimize_parameters_grid" expanded="true" height="112" name="Optimize Parameters (Grid)" width="90" x="246" y="300">
           <list key="parameters">
             <parameter key="SVM (2).gamma" value="[1.0E-6;8;10;linear]"/>
             <parameter key="SVM (2).C" value="[0.01;32768.00000;10;linear]"/>
           </list>
           <process expanded="true" height="417" width="734">
             <operator activated="false" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="131" y="150">
               <process expanded="true" height="435" width="351">
                 <operator activated="false" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM (2)" width="90" x="112" y="30">
                   <parameter key="svm_type" value="nu-SVC"/>
                   <parameter key="gamma" value="1.0E-6"/>
                   <parameter key="C" value="0.01"/>
                   <list key="class_weights"/>
                   <parameter key="calculate_confidences" value="true"/>
                 </operator>
                 <connect from_port="training" to_op="SVM (2)" to_port="training set"/>
                 <connect from_op="SVM (2)" from_port="model" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true" height="435" width="351">
                 <operator activated="false" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                   <list key="application_parameters"/>
                   <parameter key="create_view" value="true"/>
                 </operator>
                 <operator activated="false" class="performance" expanded="true" height="76" name="Performance" width="90" x="198" y="30"/>
                 <connect from_port="model" to_op="Apply Model" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                 <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                 <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <connect from_port="input 1" to_op="Validation" to_port="training"/>
             <connect from_op="Validation" from_port="model" to_port="result 1"/>
             <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="write_parameters" expanded="true" height="60" name="Write Parameters" width="90" x="581" y="390">
           <parameter key="parameter_file" value="myOpt.txt.par"/>
           <parameter key="encoding" value="GB2312"/>
         </operator>
         <operator activated="false" class="x_validation" expanded="true" height="112" name="Validation (4)" width="90" x="246" y="435">
           <process expanded="true" height="417" width="342">
             <operator activated="false" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM (5)" width="90" x="130" y="30">
               <parameter key="gamma" value="1.0E-6"/>
               <parameter key="C" value="32768.0"/>
               <parameter key="nu" value="0.01"/>
               <list key="class_weights"/>
               <parameter key="calculate_confidences" value="true"/>
             </operator>
             <connect from_port="training" to_op="SVM (5)" to_port="training set"/>
             <connect from_op="SVM (5)" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true" height="417" width="342">
             <operator activated="false" class="apply_model" expanded="true" height="76" name="Apply Model (4)" width="90" x="45" y="30">
               <list key="application_parameters"/>
               <parameter key="create_view" value="true"/>
             </operator>
             <operator activated="false" class="performance" expanded="true" height="76" name="Performance (4)" width="90" x="198" y="30"/>
             <connect from_port="model" to_op="Apply Model (4)" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model (4)" to_port="unlabelled data"/>
             <connect from_op="Apply Model (4)" from_port="labelled data" to_op="Performance (4)" to_port="labelled data"/>
             <connect from_op="Performance (4)" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Retrieve" from_port="output" to_op="Rename" to_port="example set input"/>
         <connect from_op="Rename" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
         <connect from_op="Replace Missing Values" from_port="example set output" to_op="Weight by Relief" to_port="example set"/>
         <connect from_op="Weight by Relief" from_port="weights" to_op="Select by Weights" to_port="weights"/>
         <connect from_op="Weight by Relief" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
         <connect from_op="Select by Weights" from_port="example set output" to_op="Optimize Parameters (2)" to_port="input 1"/>
         <connect from_op="Optimize Parameters (2)" from_port="performance" to_port="result 2"/>
         <connect from_op="Optimize Parameters (2)" from_port="parameter" to_op="Write Parameters" to_port="input"/>
         <connect from_op="Write Parameters" from_port="through" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>






  • Options
    beijinghe2008beijinghe2008 Member Posts: 13 Contributor II
    part of my data with excel format, you can import it into library and try it:

    ID CLU (label) LLD LLS ILD ILM DEN CNL AC GR POR PERM RQI SO SH CEC Swi Qv
    1 oilwithlowRES 6.20 5.85 3.76 4.50 2.58 16.60 269.90 89.86 28.09 1.62 2.40 29.67 31.63 6.92 33.62 0.67
    2 oilwithlowRES 7.59 5.11 4.81 4.73 2.50 21.15 272.24 83.27 29.25 1.24 2.06 29.84 27.69 6.11 30.94 0.59
    3 oilwithlowRES 7.00 7.15 4.36 4.73 2.48 17.50 265.35 67.10 28.90 0.99 1.85 29.86 20.09 6.13 24.45 0.60
    4 oilwithlowRES 5.22 3.53 3.65 3.67 2.48 25.14 297.68 93.94 31.50 0.98 1.77 29.90 34.36 8.98 35.31 0.52
    5 oilwithlowRES 7.56 7.43 5.38 5.78 2.47 19.00 267.94 73.25 29.24 1.04 1.89 29.88 22.67 6.05 26.90 0.59
    6 oilwithlowRES 4.61 4.49 3.81 3.92 2.44 24.90 296.64 90.64 31.95 0.82 1.60 29.92 32.13 8.97 33.95 0.51
    7 oilwithlowRES 6.67 6.94 4.39 4.57 2.42 22.50 265.12 81.70 29.68 1.12 1.94 29.86 26.83 6.28 30.31 0.59
    8 oilwithlowRES 8.28 8.00 4.41 5.38 2.41 23.40 292.07 83.61 31.97 0.68 1.45 30.09 27.88 8.25 31.09 0.67
    9 oilwithlowRES 6.41 4.26 3.73 3.75 2.39 30.07 289.46 98.70 31.98 0.98 1.75 29.97 37.85 9.85 37.27 0.56
    10 oilwithlowRES 7.15 6.41 4.23 4.61 2.38 22.00 285.25 85.94 31.72 0.78 1.56 30.02 29.22 8.30 32.04 0.67
    11 oilwithlowRES 5.89 6.49 4.20 2.40 2.38 22.32 300.99 122.39 32.97 1.25 1.95 29.96 31.72 13.12 33.69 0.71
    21 oilwithlowRES 4.57 3.82 4.57 4.87 2.10 31.53 299.56 71.54 36.42 0.51 1.18 30.32 21.92 10.40 26.23 0.68
    22 oilwithlowRES 5.35 4.81 5.49 5.47 2.07 33.14 301.08 75.87 36.86 0.51 1.18 30.37 23.88 11.19 27.97 0.51
    23 waterLevel 54.08 52.01 12.25 17.01 2.61 18.10 196.97 56.50 21.91 2.25 3.21 29.72 16.35 3.63 20.26 0.34
    24 waterLevel 12.03 11.61 5.00 6.05 2.53 25.53 305.37 116.70 31.43 1.46 2.16 30.03 34.84 11.33 35.58 0.66
    25 waterLevel 7.22 6.56 5.61 6.05 2.52 19.20 265.35 86.91 28.48 1.48 2.28 29.75 29.80 5.91 32.42 0.39
    26 waterLevel 8.01 7.74 4.70 5.51 2.51 19.80 255.82 68.35 27.85 1.24 2.11 29.80 20.58 3.46 24.94 0.24
    31 waterLevel 12.01 10.81 6.84 7.13 2.40 25.43 331.26 94.91 35.10 0.24 0.83 30.38 35.04 11.85 35.73 0.58
    42 waterLevel 18.67 18.73 8.60 11.06 2.35 19.00 265.26 63.26 30.62 0.55 1.34 30.27 18.63 5.05 22.94 0.30
    43 waterLevel 12.17 9.68 6.82 7.77 2.34 23.69 298.90 73.62 33.33 0.19 0.75 30.33 22.84 8.23 27.07 0.44
    44 waterLevel 20.44 13.18 6.61 7.85 2.32 28.68 338.21 105.02 36.65 0.12 0.57 30.58 33.09 14.12 34.57 0.65
    45 waterLevel 8.03 7.83 4.92 5.50 2.32 22.90 289.13 82.23 32.85 0.46 1.18 30.16 27.12 8.78 30.54 0.48
    46 waterLevel 8.45 7.97 5.34 5.82 2.31 22.09 307.90 92.05 34.41 0.33 0.98 30.25 33.06 11.02 34.55 0.56
    47 waterLevel 7.23 6.79 5.34 5.59 2.31 26.08 313.95 89.78 34.88 0.19 0.73 30.25 31.57 11.14 33.62 0.55
    53 waterLevel 19.63 13.79 6.61 7.85 2.29 26.47 310.05 68.37 34.88 0.25 0.84 30.58 20.59 8.87 24.97 0.44
    54 waterLevel 8.71 8.31 6.01 6.29 2.28 23.70 291.88 76.80 33.54 0.21 0.78 30.25 24.33 8.73 28.35 0.46
    59 oilLevel 26.77 24.26 13.30 15.85 2.80 21.57 173.80 76.47 17.79 3.53 4.45 28.92 24.17 2.57 28.14 0.32
    60 oilLevel 22.80 23.39 11.60 12.89 2.66 23.80 249.29 61.20 25.43 1.60 2.51 29.86 17.90 6.84 22.11 0.53
    61 oilLevel 23.59 22.45 7.12 7.86 2.58 15.06 239.27 63.88 25.60 1.62 2.52 29.87 18.86 7.25 23.16 0.56
    62 oilLevel 7.94 7.78 5.22 5.46 2.53 15.60 248.68 77.30 27.03 1.59 2.43 29.68 24.57 3.77 28.52 0.27
    63 oilLevel 10.71 9.18 12.81 13.46 2.52 14.26 249.71 79.41 27.27 1.58 2.41 29.77 25.63 4.18 29.37 0.30
    64 oilLevel 9.39 7.60 6.54 7.62 2.51 24.85 312.03 95.86 32.21 0.87 1.65 30.09 35.73 9.73 36.10 0.54
    69 oilLevel 7.55 7.29 5.88 6.38 2.49 18.70 265.33 68.80 28.80 1.05 1.91 29.86 20.77 4.24 25.13 0.28
    70 oilLevel 6.75 6.18 4.99 5.35 2.49 19.70 272.35 92.93 29.40 1.41 2.19 29.79 33.66 7.25 34.88 0.46
    71 oilLevel 15.29 11.60 8.76 8.26 2.48 19.00 257.98 73.05 28.31 1.24 2.09 29.97 22.58 4.31 26.81 0.29
    72 oilLevel 8.05 6.23 4.26 4.02 2.48 26.48 266.30 86.49 28.97 1.37 2.17 29.82 29.54 6.24 32.25 0.41
    73 oilLevel 13.63 12.25 6.31 7.42 2.48 16.60 245.83 82.40 27.44 1.61 2.42 29.83 27.21 4.63 30.58 0.32
    74 oilLevel 11.06 11.50 7.14 7.40 2.48 14.90 239.58 49.61 26.97 1.04 1.97 29.90 14.34 6.80 17.64 0.49
    75 oilLevel 13.04 11.04 5.65 6.98 2.47 16.70 256.95 87.16 28.36 1.51 2.31 29.88 29.95 5.84 32.52 0.39
    76 oilLevel 9.78 9.33 5.85 6.30 2.47 17.80 266.58 63.85 29.13 0.87 1.73 29.98 18.85 3.97 23.17 0.26
    77 oilLevel 21.31 18.52 6.69 6.98 2.47 17.45 241.40 81.00 27.18 1.64 2.45 29.92 26.46 4.28 30.01 0.30
    78 oilLevel 13.84 14.12 5.85 7.05 2.47 25.30 303.90 66.06 32.08 0.30 0.96 30.30 19.68 6.47 24.06 0.36
    79 oilLevel 23.14 14.07 6.62 5.73 2.47 20.30 255.02 68.62 28.28 1.15 2.02 30.09 20.69 3.82 25.05 0.26
    80 oilLevel 23.63 24.74 9.80 9.76 2.46 18.30 261.29 56.75 28.82 0.80 1.66 30.21 16.43 2.98 20.39 0.20
    81 oilLevel 50.95 47.82 7.29 2.46 2.46 10.40 277.49 40.58 30.10 0.20 0.81 30.61 11.08 2.25 12.45 0.14
    82 oilLevel 8.38 8.62 4.90 5.24 2.46 24.90 326.54 65.67 33.95 0.11 0.56 30.32 19.53 7.87 23.91 0.41
    83 oilLevel 9.32 9.31 6.36 7.43 2.46 16.30 258.20 72.85 28.63 1.16 2.02 29.88 22.49 4.54 26.74 0.30
    84 oilLevel 8.99 8.33 4.22 4.60 2.46 21.20 281.57 75.50 30.51 0.82 1.64 30.02 23.71 6.26 27.81 0.38
    85 oilLevel 8.63 8.49 5.45 5.80 2.46 27.00 293.04 64.53 31.40 0.41 1.14 30.14 19.10 5.79 23.45 0.34
    86 oilLevel 15.12 10.89 7.12 7.85 2.45 18.15 263.49 78.40 29.12 1.17 2.01 30.02 25.12 5.50 28.97 0.35
    87 oilLevel 10.81 9.57 6.90 7.53 2.45 22.00 269.15 76.95 29.57 1.05 1.88 29.98 24.40 5.70 28.39 0.36
    88 oilLevel 4.81 4.45 3.72 3.83 2.45 0.22 286.36 85.60 30.93 0.94 1.74 29.87 29.02 7.66 31.90 0.45
    89 oilLevel 16.72 16.25 15.36 16.98 2.45 18.86 266.85 49.42 29.43 0.52 1.33 30.22 14.29 2.67 17.58 0.17
    90 oilLevel 9.99 8.79 5.16 6.05 2.45 20.58 271.05 75.15 29.76 0.97 1.81 29.99 23.54 5.65 27.67 0.35
    91 oilLevel 13.22 11.76 5.67 6.66 2.45 18.80 276.86 61.66 30.21 0.60 1.41 30.16 18.06 4.57 22.31 0.28
    92 oilLevel 10.15 14.89 8.06 8.15 2.45 25.60 331.64 60.53 34.50 0.33 0.97 30.43 17.67 7.75 21.88 0.39
    93 oilLevel 8.70 9.67 5.69 6.27 2.45 18.10 262.60 61.45 29.13 0.83 1.69 29.96 17.99 3.71 22.23 0.24
    94 oilLevel 24.69 26.01 12.91 16.16 2.45 12.40 242.69 52.38 27.59 0.97 1.87 30.13 15.11 7.57 18.70 0.53
    95 oilLevel 10.78 1.00 4.50 5.56 2.44 15.63 245.27 84.36 27.82 1.57 2.37 29.80 28.30 5.13 31.38 0.35
    96 oilLevel 16.87 16.02 9.13 10.59 2.44 19.10 234.98 81.03 27.07 1.66 2.48 29.85 26.47 4.20 30.02 0.30
    97 oilLevel 6.71 5.65 4.15 4.67 2.44 18.80 284.71 90.55 30.96 1.03 1.83 29.93 32.07 8.20 33.92 0.48
    98 oilLevel 12.81 13.00 7.06 7.17 2.44 19.20 264.12 64.48 29.36 0.84 1.69 30.06 19.08 4.21 23.42 0.27
    99 oilLevel 17.15 15.49 17.69 9.60 2.44 21.30 257.55 75.11 28.86 1.16 2.01 30.04 23.53 4.95 27.65 0.32
    100 oilLevel 10.23 9.14 6.70 6.94 2.44 19.00 263.50 79.69 29.32 1.16 1.99 29.94 25.77 5.79 29.49 0.37
    101 oilLevel 27.81 27.16 4.96 5.62 2.44 43.20 322.06 56.36 33.94 0.29 0.93 30.66 16.31 6.88 20.26 0.35
    102 oilLevel 12.62 11.22 7.54 8.38 2.43 29.10 275.59 66.25 30.33 0.67 1.49 30.13 19.75 5.15 24.12 0.31
    103 oilLevel 10.98 9.58 5.77 6.14 2.43 20.43 293.35 84.65 31.76 0.74 1.53 30.14 28.47 8.19 31.51 0.47
    104 oilLevel 47.99 52.72 5.30 6.99 2.43 13.40 247.01 77.88 28.15 1.37 2.20 30.22 24.86 4.70 28.76 0.32
    105 oilLevel 13.88 13.76 7.71 7.75 2.43 16.40 257.82 81.56 28.99 1.26 2.09 29.98 26.75 5.74 30.24 0.37
    106 oilLevel 9.63 8.54 6.05 6.79 2.43 19.70 282.25 76.73 30.90 0.76 1.57 30.07 24.29 6.70 28.30 0.40
    107 oilLevel 8.29 9.37 4.85 5.23 2.43 18.66 292.54 123.50 31.71 1.54 2.20 29.94 33.15 12.26 34.58 0.70
    108 oilLevel 8.02 5.78 3.68 4.09 2.43 22.61 275.87 93.93 30.42 1.21 2.00 29.92 34.35 8.14 35.30 0.49
    109 oilLevel 15.62 11.54 11.66 10.68 2.43 16.57 238.30 79.07 27.49 1.53 2.36 29.88 25.45 4.32 29.23 0.30
    110 oilLevel 14.72 12.12 7.15 6.77 2.43 20.29 266.35 57.94 29.68 0.64 1.47 30.16 16.81 3.77 20.86 0.24
    111 oilLevel 24.60 22.89 8.94 11.22 2.43 24.00 289.51 61.61 31.49 0.33 1.02 30.41 18.04 5.55 22.30 0.32
    112 oilLevel 11.36 7.64 9.74 9.35 2.43 19.93 287.12 74.23 31.32 0.62 1.41 30.15 23.12 6.75 27.30 0.39
    113 oilLevel 9.24 7.71 6.60 5.63 2.43 23.51 275.32 79.25 30.41 0.92 1.74 30.01 25.55 6.58 29.32 0.40
    114 oilLevel 10.70 9.53 5.72 7.00 2.42 18.20 265.77 85.85 29.68 1.21 2.02 29.95 29.16 6.72 31.99 0.42
    115 oilLevel 7.38 8.07 6.15 6.21 2.42 21.87 288.64 124.04 31.46 1.60 2.26 29.89 33.87 12.13 35.01 0.70
    116 oilLevel 8.07 7.51 5.33 5.59 2.42 30.60 342.63 72.59 35.69 0.33 0.97 30.41 22.38 9.94 26.66 0.47
    117 oilLevel 8.16 7.84 4.96 4.53 2.42 21.10 294.49 101.90 32.01 1.04 1.80 30.02 30.42 10.21 32.84 0.57
    118 oilLevel 12.64 15.18 7.56 8.42 2.42 10.87 282.84 119.21 31.12 1.58 2.25 30.01 37.77 11.36 37.21 0.67
    119 oilLevel 8.66 7.24 5.47 5.89 2.41 19.80 274.52 84.99 30.50 1.01 1.82 29.98 28.66 7.26 31.64 0.44
    120 oilLevel 10.04 8.57 5.28 5.86 2.41 24.20 285.66 92.36 31.37 0.98 1.77 30.06 33.27 8.71 34.66 0.50
    121 oilLevel 14.95 12.22 5.57 7.08 2.41 13.50 249.75 70.20 28.57 1.12 1.98 30.01 21.35 4.21 25.68 0.28
    122 oilLevel 18.55 19.04 40.69 12.65 2.41 19.90 258.07 59.52 29.23 0.77 1.62 30.17 17.33 3.58 21.47 0.23
    123 oilLevel 16.55 14.18 8.59 8.76 2.41 22.56 294.23 108.09 32.08 1.15 1.89 30.18 35.90 10.92 36.19 0.61
    124 oilLevel 7.46 6.39 5.43 5.55 2.41 13.00 286.22 79.36 31.47 0.70 1.49 30.04 25.60 7.41 29.37 0.43
    125 oilLevel 12.68 13.81 7.63 8.93 2.41 21.30 277.69 58.59 30.81 0.41 1.16 30.21 17.02 4.71 21.12 0.28
    126 oilLevel 36.08 32.50 5.06 5.78 2.41 13.10 228.62 54.51 27.00 1.14 2.05 30.16 15.74 7.34 19.52 0.53
    127 oilLevel 7.93 8.33 5.73 5.94 2.41 19.70 279.00 88.88 30.94 1.00 1.80 29.98 31.00 8.01 33.23 0.47
    128 oilLevel 6.76 5.65 3.60 3.24 2.40 26.90 321.74 97.79 34.31 0.47 1.17 30.17 37.16 11.54 36.91 0.59
    129 oilLevel 23.33 23.00 10.62 14.80 2.40 22.40 273.00 59.86 30.54 0.50 1.27 30.33 17.44 4.64 21.61 0.28
    130 oilLevel 11.84 9.76 5.41 6.19 2.40 18.65 279.56 67.88 31.06 0.55 1.33 30.17 20.39 5.88 24.77 0.35
    131 oilLevel 7.03 7.03 4.50 3.68 2.40 28.29 286.86 91.26 31.63 0.90 1.69 29.99 32.54 8.79 34.21 0.50
    132 oilLevel 2.67 2.87 2.43 2.65 2.40 28.45 385.63 122.50 39.35 0.10 0.49 30.22 31.85 18.04 33.81 0.74
    133 oilLevel 8.20 7.81 4.74 5.20 2.39 24.90 306.42 83.12 33.25 0.40 1.09 30.19 27.60 9.18 30.90 0.49
    134 oilLevel 14.46 15.12 12.33 12.73 2.39 18.45 238.11 76.62 27.94 1.39 2.23 29.91 24.24 4.41 28.25 0.30
    135 oilLevel 21.36 21.14 9.29 9.47 2.39 21.20 298.25 55.73 32.67 0.04 0.34 30.51 16.11 5.84 20.02 0.32
    136 oilLevel 13.84 12.13 5.89 6.44 2.38 20.30 274.65 71.43 30.90 0.65 1.46 30.18 21.87 6.13 26.18 0.36
    137 oilLevel 7.43 8.45 5.57 6.26 2.38 22.30 285.16 77.65 31.72 0.61 1.38 30.07 24.74 7.42 28.68 0.42
    138 oilLevel 62.85 49.25 10.62 14.83 2.37 24.47 267.71 59.58 30.48 0.50 1.28 30.57 17.35 4.56 21.50 0.28
    139 oilLevel 11.53 12.92 7.32 8.13 2.37 18.82 295.18 123.04 32.64 1.33 2.02 30.10 32.56 12.94 34.22 0.71
    140 oilLevel 8.45 7.29 4.60 5.27 2.37 20.30 294.31 95.37 32.59 0.78 1.55 30.10 35.37 9.97 35.90 0.55
    141 oilLevel 9.60 8.23 6.04 5.42 2.37 20.30 297.30 67.52 32.82 0.17 0.72 30.26 20.25 7.20 24.64 0.39
    142 oilLevel 18.18 16.89 7.96 9.58 2.37 17.80 249.80 68.26 29.13 0.96 1.82 30.11 20.55 4.44 24.91 0.29
    143 oilLevel 8.39 6.08 6.84 6.55 2.37 23.46 298.50 79.63 32.94 0.39 1.09 30.18 25.74 8.57 29.48 0.46
    144 oilLevel 10.16 9.87 7.96 8.01 2.36 18.80 267.56 77.06 30.61 0.83 1.65 30.06 24.45 6.51 28.44 0.39
    145 oilLevel 17.71 12.75 8.68 9.85 2.36 24.05 316.70 99.46 34.47 0.47 1.17 30.41 38.45 11.84 37.60 0.60
    146 oilLevel 17.16 17.48 8.13 9.50 2.36 20.10 267.75 61.61 30.65 0.51 1.29 30.26 18.05 4.90 22.30 0.29
    147 oilLevel 6.03 5.54 3.31 3.74 2.36 22.30 315.72 93.37 34.40 0.36 1.02 30.16 33.96 11.15 35.09 0.56
    148 oilLevel 9.81 8.91 4.46 5.21 2.36 33.60 259.06 59.86 29.98 0.61 1.43 30.08 17.44 4.21 21.61 0.26
    149 oilLevel 6.07 5.85 3.87 2.36 2.36 20.30 296.92 78.69 32.96 0.37 1.06 30.11 25.26 8.49 29.10 0.46
    150 oilLevel 10.10 10.64 8.01 8.06 2.35 21.90 267.52 99.19 30.70 1.26 2.03 29.98 38.24 8.91 37.47 0.53
    151 oilLevel 7.34 6.83 6.20 6.27 2.35 24.40 292.28 77.78 32.65 0.41 1.13 30.14 24.81 8.15 28.74 0.45
    152 oilLevel 13.31 11.64 10.56 8.68 2.35 21.88 288.03 70.47 32.33 0.33 1.02 30.29 21.46 7.13 25.81 0.40
    153 oilLevel 18.03 16.78 8.75 10.28 2.35 18.50 280.09 55.19 31.73 0.15 0.69 30.39 15.95 5.05 19.81 0.29
    154 oilLevel 8.89 7.47 4.52 5.48 2.35 22.34 294.14 68.83 32.83 0.19 0.77 30.24 20.78 7.34 25.16 0.40
    155 oilLevel 19.08 26.10 20.23 13.03 2.35 31.03 313.25 64.42 34.32 0.21 0.78 30.56 19.06 8.03 23.41 0.41
    160 oilLevel 10.74 9.54 5.38 5.93 2.34 19.40 276.77 100.76 31.56 1.11 1.88 30.06 39.49 9.74 38.12 0.56
    161 oilLevel 11.54 7.04 6.70 6.92 2.34 21.05 286.31 79.63 32.31 0.52 1.27 30.21 25.74 8.08 29.48 0.45
    162 oilLevel 15.89 15.13 11.06 11.53 2.34 21.50 270.20 83.84 31.06 0.87 1.68 30.17 28.01 7.57 31.18 0.45
    163 oilLevel 9.94 8.64 4.59 5.38 2.34 23.50 316.00 94.69 34.66 0.33 0.98 30.30 34.88 11.49 35.63 0.57
    164 oilLevel 13.59 14.28 11.29 11.93 2.34 20.45 252.06 86.58 29.72 1.21 2.02 30.01 29.60 6.82 32.29 0.43
    165 oilLevel 9.60 9.15 5.33 2.33 2.33 22.80 294.12 74.18 33.03 0.26 0.89 30.25 23.10 8.06 27.30 0.43
    166 oilLevel 9.51 8.19 4.82 5.69 2.32 20.40 285.52 87.70 32.58 0.63 1.39 30.16 30.27 9.15 32.76 0.50
    167 oilLevel 11.43 10.22 7.22 8.03 2.31 24.10 296.43 86.85 33.49 0.42 1.12 30.27 29.76 9.76 32.41 0.51
    168 oilLevel 27.71 26.05 16.13 16.89 2.31 22.60 295.14 51.43 33.40 0.28 0.91 30.65 14.84 5.94 18.35 0.31
    169 oilLevel 20.83 19.92 9.42 13.35 2.30 17.40 233.97 57.96 28.70 0.85 1.72 30.16 16.82 3.02 20.86 0.20
    170 oilLevel 7.28 6.50 3.32 3.75 2.30 22.90 299.02 110.77 33.79 0.84 1.58 30.11 38.51 12.52 37.62 0.65
    171 oilLevel 10.23 9.09 4.55 5.39 2.30 21.60 287.11 87.82 32.87 0.57 1.32 30.20 30.35 9.39 32.81 0.51
    172 oilLevel 7.72 8.14 3.23 3.86 2.30 24.90 312.41 84.68 34.86 0.09 0.50 30.29 28.49 10.59 31.55 0.52
    173 oilLevel 8.89 7.92 5.58 6.17 2.29 24.50 296.58 78.88 33.80 0.19 0.75 30.27 25.36 9.16 29.19 0.48
    174 oilLevel 12.91 11.06 5.64 7.04 2.26 20.00 278.40 79.29 32.72 0.43 1.15 30.27 25.57 8.37 29.35 0.46
    178 oilLevel 26.73 31.04 19.00 20.07 2.24 16.80 263.94 62.02 31.88 0.26 0.90 30.46 18.19 5.89 22.47 0.33
    179 oilLevel 16.58 15.30 15.45 17.00 2.21 24.41 319.51 64.15 36.63 0.70 1.39 30.69 18.96 9.78 23.30 0.45
    180 oilLevel 8.08 7.29 4.38 4.29 2.20 23.80 284.02 74.19 33.94 0.07 0.45 30.27 23.10 8.77 27.31 0.45
    181 oilLevel 14.81 15.42 10.52 11.40 2.10 26.43 268.49 54.73 33.94 0.33 0.98 30.52 15.80 6.71 19.62 0.35
    182 oilLevel 7.18 6.86 7.51 8.01 2.07 33.27 296.31 62.69 36.58 0.72 1.41 30.49 18.43 9.58 22.72 0.44
    183 oilAndWater 34.92 33.89 6.27 8.00 2.56 11.50 209.16 50.59 23.55 1.79 2.75 29.83 14.61 4.27 18.00 0.37
    184 oilAndWater 20.25 20.37 10.68 12.05 2.54 12.10 239.38 46.34 26.21 1.14 2.08 29.99 13.48 5.87 16.39 0.44
    185 oilAndWater 19.22 19.07 6.90 9.37 2.50 12.00 200.21 59.82 23.54 1.98 2.90 29.63 17.43 5.24 21.56 0.45
    186 oilAndWater 14.85 10.12 6.06 7.64 2.47 18.07 227.54 54.85 26.14 1.32 2.25 29.86 15.84 6.72 19.65 0.50
    187 oilAndWater 7.45 6.68 4.71 4.96 2.43 22.00 327.71 116.37 34.44 0.82 1.54 30.15 34.46 13.62 35.38 0.69
    188 oilAndWater 14.49 13.18 4.99 5.78 2.43 16.50 257.77 65.38 28.99 0.94 1.80 30.06 19.42 4.02 23.77 0.26
    189 oilAndWater 11.76 11.29 6.82 7.52 2.41 26.20 296.73 65.03 32.25 0.24 0.86 30.28 19.29 6.49 23.65 0.36
    193 oilAndWater 7.45 6.68 4.66 4.95 2.40 19.30 301.34 87.02 32.77 0.58 1.32 30.11 29.86 9.22 32.48 0.50
    194 oilAndWater 10.16 8.14 5.65 5.82 2.38 22.35 286.06 109.39 31.82 1.23 1.97 30.04 37.15 10.86 36.88 0.62
    195 oilAndWater 8.91 7.92 4.61 5.50 2.38 19.00 263.39 94.85 30.10 1.30 2.08 29.91 35.00 7.99 35.68 0.49
    196 oilAndWater 32.26 27.10 15.81 22.65 2.37 22.83 285.74 89.88 31.92 0.81 1.60 30.39 31.64 8.87 33.65 0.50
    197 oilAndWater 10.79 10.29 5.82 7.15 2.37 20.20 233.54 81.93 27.85 1.51 2.33 29.81 26.96 4.90 30.39 0.34
    198 oilAndWater 13.55 13.56 11.62 11.22 2.35 21.02 236.22 76.20 28.30 1.30 2.14 29.93 24.04 4.64 28.08 0.31
    199 oilAndWater 7.46 6.88 6.91 6.44 2.35 19.65 267.28 86.29 30.73 0.99 1.80 29.95 29.42 7.57 32.17 0.45
    200 oilAndWater 6.25 5.61 7.10 6.17 2.35 23.55 287.47 91.74 32.31 0.77 1.54 30.02 32.85 9.37 34.41 0.52
    201 oilAndWater 13.67 12.73 11.45 11.23 2.35 25.25 296.49 106.92 33.01 0.93 1.68 30.21 34.81 11.51 35.57 0.62
    202 oilAndWater 16.43 12.82 4.85 5.85 2.35 20.55 301.43 98.09 33.40 0.67 1.41 30.32 37.39 10.88 37.03 0.57
    203 oilAndWater 15.16 13.19 5.45 6.65 2.35 28.45 313.77 112.63 34.36 0.76 1.49 30.32 30.41 13.16 32.84 0.67
    204 oilAndWater 21.65 17.43 11.52 14.33 2.33 20.60 275.11 67.55 31.59 0.43 1.17 30.36 20.26 6.26 24.64 0.36
    205 oilAndWater 17.04 14.91 4.84 5.62 2.31 24.40 292.85 67.28 33.25 0.07 0.47 30.43 20.16 7.50 24.55 0.40
    206 oilAndWater 36.84 35.53 25.05 34.33 2.27 30.51 310.12 99.51 35.09 0.34 0.98 30.63 38.49 12.33 37.62 0.60
    207 oilAndWater 15.35 11.92 5.65 6.45 2.23 25.65 305.91 55.39 35.30 0.60 1.31 30.62 16.00 7.82 19.87 0.38




    Thanks in advance!

    Regards,
    Xu He
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi again,

    I once went up to Dortmund to sample the beer and a course given by the blessed Klink. A much repeated message was that simple is good! If I edit your data to take out "(label)", and save the data as .CSV, like this ...
    ID   CLU (label)   LLD   LLS   ILD   ILM   DEN   CNL   AC   GR   POR   PERM   RQI   SO   SH   CEC   Swi   Qv
    1    oilwithlowRES   6.20    5.85    3.76    4.50    2.58    16.60    269.90    89.86    28.09    1.62    2.40    29.67    31.63    6.92    33.62    0.67
    I can then split the data 90% training, 10%  test, and use the 90% to find values for C and Gamma, which can be used  to learn and predict on the test set, like this...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
         <location/>
         <location/>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <parameter key="logverbosity" value="error"/>
       <parameter key="encoding" value="GB2312"/>
       <process expanded="true" height="391" width="915">
         <operator activated="true" class="subprocess" expanded="true" height="94" name="Train/Test Sets" width="90" x="45" y="120">
           <process expanded="true">
             <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="45" y="165">
               <description>Removed "(label)"  from "CLU (label)"</description>
               <parameter key="file_name" value="C:\Documents and Settings\Alien\My Documents\rm_workspace\R5 Forum\HU.csv"/>
             </operator>
             <operator activated="true" class="set_role" expanded="true" height="76" name="Set ID" width="90" x="179" y="165">
               <parameter key="name" value="ID"/>
               <parameter key="target_role" value="id"/>
             </operator>
             <operator activated="true" class="set_role" expanded="true" height="76" name="Set Label" width="90" x="313" y="165">
               <parameter key="name" value="CLU"/>
               <parameter key="target_role" value="label"/>
             </operator>
             <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="447" y="165"/>
             <operator activated="true" class="split_data" expanded="true" height="94" name="Split Data 90-10" width="90" x="581" y="165">
               <enumeration key="partitions">
                 <parameter key="ratio" value="0.9"/>
                 <parameter key="ratio" value="0.1"/>
               </enumeration>
               <parameter key="sampling_type" value="stratified sampling"/>
             </operator>
             <connect from_op="Read CSV" from_port="output" to_op="Set ID" to_port="example set input"/>
             <connect from_op="Set ID" from_port="example set output" to_op="Set Label" to_port="example set input"/>
             <connect from_op="Set Label" from_port="example set output" to_op="Normalize" to_port="example set input"/>
             <connect from_op="Normalize" from_port="example set output" to_op="Split Data 90-10" to_port="example set"/>
             <connect from_op="Split Data 90-10" from_port="partition 1" to_port="out 1"/>
             <connect from_op="Split Data 90-10" from_port="partition 2" to_port="out 2"/>
             <portSpacing port="source_in 1" spacing="0"/>
             <portSpacing port="sink_out 1" spacing="0"/>
             <portSpacing port="sink_out 2" spacing="0"/>
             <portSpacing port="sink_out 3" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="246" y="30">
           <list key="parameters">
             <parameter key="SVM.C" value="[0.0;1000;10;linear]"/>
             <parameter key="SVM.gamma" value="[0.0;1;10;linear]"/>
           </list>
           <process expanded="true">
             <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
               <process expanded="true">
                 <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="130" y="30">
                   <parameter key="gamma" value="1.0"/>
                   <parameter key="C" value="1000.0"/>
                   <list key="class_weights"/>
                   <parameter key="shrinking" value="false"/>
                 </operator>
                 <connect from_port="training" to_op="SVM" to_port="training set"/>
                 <connect from_op="SVM" from_port="model" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true">
                 <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                   <list key="application_parameters"/>
                   <parameter key="create_view" value="true"/>
                 </operator>
                 <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                   <parameter key="use_example_weights" value="false"/>
                 </operator>
                 <connect from_port="model" to_op="Apply Model" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                 <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                 <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <connect from_port="input 1" to_op="Validation" to_port="training"/>
             <connect from_op="Validation" from_port="model" to_port="result 1"/>
             <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="set_parameters" expanded="true" height="60" name="Set SVM2 parameters" width="90" x="447" y="75">
           <list key="name_map">
             <parameter key="SVM" value="SVM2"/>
           </list>
         </operator>
         <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM2" width="90" x="246" y="165">
           <parameter key="gamma" value="0.8"/>
           <parameter key="C" value="800.0"/>
           <list key="class_weights"/>
         </operator>
         <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply SVM2 on test set" width="90" x="447" y="165">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance_classification" expanded="true" height="76" name="Test Results" width="90" x="581" y="165">
           <list key="class_weights"/>
         </operator>
         <connect from_op="Train/Test Sets" from_port="out 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
         <connect from_op="Train/Test Sets" from_port="out 2" to_op="SVM2" to_port="training set"/>
         <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
         <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_op="Set SVM2 parameters" to_port="parameter set"/>
         <connect from_op="SVM2" from_port="model" to_op="Apply SVM2 on test set" to_port="model"/>
         <connect from_op="SVM2" from_port="exampleSet" to_op="Apply SVM2 on test set" to_port="unlabelled data"/>
         <connect from_op="Apply SVM2 on test set" from_port="labelled data" to_op="Test Results" to_port="labelled data"/>
         <connect from_op="Apply SVM2 on test set" from_port="model" to_port="result 2"/>
         <connect from_op="Test Results" from_port="performance" to_port="result 3"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
         <portSpacing port="sink_result 4" spacing="0"/>
       </process>
     </operator>
    </process>
    Rumour has it that SVMs prefer normalised data for breakfast, so I've chucked that in, and we appear to get reasonable results despite the skewed distribution of the label values. I have no idea if this is what you were after, but let the wisdom of the Klink guide you - keep it simple!

  • Options
    beijinghe2008beijinghe2008 Member Posts: 13 Contributor II
    Thank you very much. I learn a lot from your sample.

    Regards,
    Xu He
  • Options
    poppop Member Posts: 21 Maven
    Hi,

    There is still somthing I don't understand Haddock. I just modified your code to use the "Sonar" data sample provided in the Samples repository, and I use the whole dataset to optimise parameters (C and Gamma) and to run classification through SVM2. I would expect to get the same results as in SVM2 we are using the parameters from the previous optimisation, instead we get 100% accuracy.
    I understand that you splited the data set precisely to avoid this phenomenon but could you  please briefly explain why this happen.
    Thank you very much for your help.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="error"/>
        <parameter key="encoding" value="GB2312"/>
        <process expanded="true" height="391" width="915">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
          <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM2" width="90" x="514" y="165">
            <parameter key="gamma" value="0.01"/>
            <parameter key="C" value="1000"/>
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply SVM2 on test set" width="90" x="648" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Test Results" width="90" x="782" y="165">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="514" y="30">
            <list key="parameters">
              <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
              <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
            </list>
            <process expanded="true" height="510" width="991">
              <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
                <process expanded="true" height="510" width="470">
                  <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="130" y="30">
                    <parameter key="gamma" value="100000"/>
                    <parameter key="C" value="100000"/>
                    <list key="class_weights"/>
                    <parameter key="shrinking" value="false"/>
                  </operator>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true" height="510" width="470">
                  <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                    <parameter key="create_view" value="true"/>
                  </operator>
                  <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                    <parameter key="use_example_weights" value="false"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="result 1"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_parameters" expanded="true" height="60" name="Set SVM2 parameters" width="90" x="648" y="75">
            <list key="name_map">
              <parameter key="SVM" value="SVM2"/>
            </list>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="SVM2" to_port="training set"/>
          <connect from_op="SVM2" from_port="model" to_op="Apply SVM2 on test set" to_port="model"/>
          <connect from_op="SVM2" from_port="exampleSet" to_op="Apply SVM2 on test set" to_port="unlabelled data"/>
          <connect from_op="Apply SVM2 on test set" from_port="labelled data" to_op="Test Results" to_port="labelled data"/>
          <connect from_op="Test Results" from_port="performance" to_port="result 2"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_op="Set SVM2 parameters" to_port="parameter set"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Pop,

    If you run my code you'll you'll see the order of execution ( by pressing the double ended blue arrow with question mark on the process tab ), and be able to confirm that the parameter optimisation is done before applying the test. I'm not sure the same applies to your code...

  • Options
    poppop Member Posts: 21 Maven
    Hi Haddok,

    Thank you very much for your very fast reply.
    You are absolutely right, here is the new code with the correct execution order:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="error"/>
        <parameter key="encoding" value="GB2312"/>
        <process expanded="true" height="391" width="915">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
          <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="514" y="30">
            <list key="parameters">
              <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
              <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
            </list>
            <process expanded="true" height="510" width="991">
              <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
                <process expanded="true" height="510" width="470">
                  <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="130" y="30">
                    <parameter key="gamma" value="100000"/>
                    <parameter key="C" value="100000"/>
                    <list key="class_weights"/>
                    <parameter key="shrinking" value="false"/>
                  </operator>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true" height="510" width="470">
                  <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                    <parameter key="create_view" value="true"/>
                  </operator>
                  <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                    <parameter key="use_example_weights" value="false"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="result 1"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_parameters" expanded="true" height="60" name="Set SVM2 parameters" width="90" x="648" y="75">
            <list key="name_map">
              <parameter key="SVM" value="SVM2"/>
            </list>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM2" width="90" x="514" y="165">
            <parameter key="gamma" value="0.01"/>
            <parameter key="C" value="1000"/>
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply SVM2 on test set" width="90" x="648" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Test Results" width="90" x="782" y="165">
            <list key="class_weights"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="SVM2" to_port="training set"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_op="Set SVM2 parameters" to_port="parameter set"/>
          <connect from_op="SVM2" from_port="model" to_op="Apply SVM2 on test set" to_port="model"/>
          <connect from_op="SVM2" from_port="exampleSet" to_op="Apply SVM2 on test set" to_port="unlabelled data"/>
          <connect from_op="Apply SVM2 on test set" from_port="labelled data" to_op="Test Results" to_port="labelled data"/>
          <connect from_op="Test Results" from_port="performance" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    But still, results after paramaters optimisation and SVM2 are differents. I don't understand why...


  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Pop,

    It is because of the validation, which just tests on subsets, rather than on the whole example set. If you remove the validation, so that the optimiser also plays with the full example set, the two performances are as expected, namely equal. Here's the point in more detail...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="error"/>
        <parameter key="encoding" value="GB2312"/>
        <process expanded="true" height="386" width="909">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
          <operator activated="true" class="optimize_parameters_grid" expanded="true" height="94" name="Optimise C and Gamma on treaining set" width="90" x="514" y="30">
            <list key="parameters">
              <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
              <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
            </list>
            <process expanded="true" height="385" width="909">
              <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="179" y="30">
                <parameter key="gamma" value="100000"/>
                <parameter key="C" value="100000"/>
                <list key="class_weights"/>
              </operator>
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="450" y="30">
                <list key="application_parameters"/>
                <parameter key="create_view" value="true"/>
              </operator>
              <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="585" y="30">
                <parameter key="use_example_weights" value="false"/>
              </operator>
              <connect from_port="input 1" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="SVM" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="18"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_parameters" expanded="true" height="60" name="Set SVM2 parameters" width="90" x="648" y="75">
            <list key="name_map">
              <parameter key="SVM" value="SVM2"/>
            </list>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM2" width="90" x="514" y="165">
            <parameter key="gamma" value="0.0001"/>
            <parameter key="C" value="100000"/>
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply SVM2 on test set" width="90" x="648" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Optimised Parameters" width="90" x="782" y="165">
            <list key="class_weights"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="SVM2" to_port="training set"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_op="Set SVM2 parameters" to_port="parameter set"/>
          <connect from_op="SVM2" from_port="model" to_op="Apply SVM2 on test set" to_port="model"/>
          <connect from_op="SVM2" from_port="exampleSet" to_op="Apply SVM2 on test set" to_port="unlabelled data"/>
          <connect from_op="Apply SVM2 on test set" from_port="labelled data" to_op="Optimised Parameters" to_port="labelled data"/>
          <connect from_op="Optimised Parameters" from_port="performance" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    poppop Member Posts: 21 Maven
    Hi Haddock,

    First I would like to thank you very much for your help and your explanations.
    But I am afraid I don't get it quite yet... My understanding is that by removing the X-Validation we have done an overfitting of the learner to the dataset, so with optimised parameters SVM1 can classified with 100% accuracy the dataset. This is not really surprising that SVM1 can achieve this result with no cross validation but it has probably lost its ability to generalise. Then SVM2 which is defined with the SVM1 parameters and applied to same dataset will achieve the same result. This make sense to me.
    What does not make sense to me is the result with X-validation. In that case my understanding of the process is that the learner will be trained and tested on different subsets. The learning process will stop when an extremum is reached on the error of classification of the data on the chosen test subset. By doing so we might not be able to classified with 100% accuracy but we keep the generalization ability.  So when SVM1 is trained with X-validation and if the result is let say 80%, it seems impossible to me that SVM2 (which is finally a copy of SVM1) can classified the whole dataset with 100% accuracy. The reason for that is that the subset on which SVM1 made the test with result 80% is mandatorily included in the whole dataset, and if SVM1 cannot achieve better than 80% on this specific subset, why would SVM2 be better on this specific subset. So there must be some misclassifications and then 100% accuracy is not possible.
    Is this wrong?
    Also are we sure that the accuracy we can read for SVM1 is coming from the optimised parameters?
    Again , thank you very much for your help!
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    Is this wrong?
    I am just jumping in but frankly: yes, it is wrong. If I get you right, I think your misunderstanding begins lies in the sentence:

    The learning process will stop when an extremum is reached on the error of classification of the data on the chosen test subset.
    The learner does not care about any performance on the test subset. In fact, it does not see it at all - that is exactly the point of cross validation. If the learner would optimize its learning phase according to the test set, it would no longer be independent and the concept of cross validation would be reduced to absurdity.

    Cheers,
    Ingo
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Pop,

    And greetings to Ingo, Magus of the pointy head and Dataminer Poursuivant to the Late Ludwig of Bavaria...

    At first glance this seems odd; if the optimiser can only get N% how can anything do better? But if you alter the viewpoint the oddness disappears. The more of the domain you train on the better the performance should be.
    In that case my understanding of the process is that the learner will be trained and tested on different subsets.
    Indeed, the SVM will be repeatedly built and tested on different splits of the data.
    The learning process will stop when an extremum is reached on the error of classification of the data on the chosen test subset. By doing so we might not be able to classified with 100% accuracy but we keep the generalization ability.
    I think the grid just does all the combos and retrieves the best.
    So when SVM1 is trained with X-validation and if the result is let say 80%, it seems impossible to me that SVM2 (which is finally a copy of SVM1) can classified the whole dataset with 100% accuracy.
    Identical learners can produce different models from different data; SVM2 has more data to work from, and there are no hidden exceptions.
    The reason for that is that the subset on which SVM1 made the test with result 80% is mandatorily included in the whole dataset, and if SVM1 cannot achieve better than 80% on this specific subset, why would SVM2 be better on this specific subset.
    Because it has more data?
    So there must be some misclassifications and then 100% accuracy is not possible.Is this wrong?
    I fear so.
    Also are we sure that the accuracy we can read for SVM1 is coming from the optimised parameters?
    I've inserted a log ( v. useful, to be recommended ) which casts light on the process.

    Good weekend to all!
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
         <location/>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <parameter key="logverbosity" value="error"/>
       <parameter key="encoding" value="GB2312"/>
       <process expanded="true" height="385" width="909">
         <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
           <parameter key="repository_entry" value="//Samples/data/Sonar"/>
         </operator>
         <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
         <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
         <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="514" y="30">
           <list key="parameters">
             <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
             <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
           </list>
           <process expanded="true" height="385" width="909">
             <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
               <process expanded="true">
                 <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="130" y="30">
                   <parameter key="gamma" value="100000"/>
                   <parameter key="C" value="100000"/>
                   <list key="class_weights"/>
                   <parameter key="shrinking" value="false"/>
                 </operator>
                 <connect from_port="training" to_op="SVM" to_port="training set"/>
                 <connect from_op="SVM" from_port="model" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true">
                 <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                   <list key="application_parameters"/>
                   <parameter key="create_view" value="true"/>
                 </operator>
                 <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                   <parameter key="use_example_weights" value="false"/>
                 </operator>
                 <connect from_port="model" to_op="Apply Model" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                 <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                 <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <operator activated="true" class="log" expanded="true" height="76" name="Optimiser log" width="90" x="548" y="65">
               <list key="log">
                 <parameter key="Validation" value="operator.Validation.value.performance"/>
                 <parameter key="C" value="operator.SVM.parameter.C"/>
                 <parameter key="G" value="operator.SVM.parameter.gamma"/>
               </list>
             </operator>
             <connect from_port="input 1" to_op="Validation" to_port="training"/>
             <connect from_op="Validation" from_port="model" to_port="result 1"/>
             <connect from_op="Validation" from_port="averagable 1" to_op="Optimiser log" to_port="through 1"/>
             <connect from_op="Optimiser log" from_port="through 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="set_parameters" expanded="true" height="60" name="Set SVM2 parameters" width="90" x="648" y="75">
           <list key="name_map">
             <parameter key="SVM" value="SVM2"/>
           </list>
         </operator>
         <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM2" width="90" x="514" y="165">
           <parameter key="gamma" value="0.01"/>
           <parameter key="C" value="10.0"/>
           <list key="class_weights"/>
         </operator>
         <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply SVM2 on test set" width="90" x="648" y="165">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance_classification" expanded="true" height="76" name="Test Results" width="90" x="782" y="165">
           <list key="class_weights"/>
         </operator>
         <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
         <connect from_op="Multiply" from_port="output 2" to_op="SVM2" to_port="training set"/>
         <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
         <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_op="Set SVM2 parameters" to_port="parameter set"/>
         <connect from_op="SVM2" from_port="model" to_op="Apply SVM2 on test set" to_port="model"/>
         <connect from_op="SVM2" from_port="exampleSet" to_op="Apply SVM2 on test set" to_port="unlabelled data"/>
         <connect from_op="Apply SVM2 on test set" from_port="labelled data" to_op="Test Results" to_port="labelled data"/>
         <connect from_op="Test Results" from_port="performance" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
  • Options
    poppop Member Posts: 21 Maven
    Hi Ingo,

    Thank you very much for jumping in and correcting me.
    I think my mistake come from a confusion between neural net training and SVM. Don't we use the error on the test set to stop training the neural net?
    In the case of SVM the test set is only used to make a performance measurement, no feed back to the learning process, is this correct?
    In the case of NN the X-validation has an impact on the model performance, for SVM it has no impact on the model performance but on the quality of the model performance measurement. Is that right?
    But still the measurement preformed on the test set indicate some misclassifications. How can we have 100% accuracy on the whole dataset with SVM2?
    Is it something normal? or is it the process design that is wrong? Or more probably my interpretation of it?
    Again, Ingo, thank you very much for your help and for taking time to educate people.

    Hi Haddock,
    Thank you for your help, the log is great.
    I might be wrong but I don't think it is due to the fact that SVM2 has more data because if you look at the model as in the code below you see that SVM1 model and SVM2 model are rigorously the same. It is not as if SVM2 was re-trained with new data unknown to SVM1.

    Thank you very much again, Ingo and Haddock for your help.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="error"/>
        <parameter key="encoding" value="GB2312"/>
        <process expanded="true" height="431" width="909">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
          <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="514" y="30">
            <list key="parameters">
              <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
              <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
            </list>
            <process expanded="true" height="385" width="909">
              <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
                <process expanded="true">
                  <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="130" y="30">
                    <parameter key="gamma" value="100000"/>
                    <parameter key="C" value="100000"/>
                    <list key="class_weights"/>
                    <parameter key="shrinking" value="false"/>
                  </operator>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                    <parameter key="create_view" value="true"/>
                  </operator>
                  <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                    <parameter key="use_example_weights" value="false"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="log" expanded="true" height="76" name="Optimiser log" width="90" x="548" y="65">
                <parameter key="filename" value="C:\Repository\toto.log"/>
                <list key="log">
                  <parameter key="Validation" value="operator.Validation.value.performance"/>
                  <parameter key="C" value="operator.SVM.parameter.C"/>
                  <parameter key="G" value="operator.SVM.parameter.gamma"/>
                </list>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="result 1"/>
              <connect from_op="Validation" from_port="averagable 1" to_op="Optimiser log" to_port="through 1"/>
              <connect from_op="Optimiser log" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_parameters" expanded="true" height="60" name="Write Parameters" width="90" x="648" y="75">
            <parameter key="parameter_file" value="C:\Repository\SVM1Parameters.par"/>
          </operator>
          <operator activated="true" class="set_parameters" expanded="true" height="60" name="Set SVM2 parameters" width="90" x="782" y="75">
            <list key="name_map">
              <parameter key="SVM" value="SVM2"/>
            </list>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM2" width="90" x="514" y="210">
            <parameter key="gamma" value="0.01"/>
            <parameter key="C" value="1000"/>
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply SVM2 on test set" width="90" x="648" y="210">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Test Results" width="90" x="782" y="165">
            <list key="class_weights"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="SVM2" to_port="training set"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_op="Write Parameters" to_port="input"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="result 1" to_port="result 2"/>
          <connect from_op="Write Parameters" from_port="through" to_op="Set SVM2 parameters" to_port="parameter set"/>
          <connect from_op="SVM2" from_port="model" to_op="Apply SVM2 on test set" to_port="model"/>
          <connect from_op="SVM2" from_port="exampleSet" to_op="Apply SVM2 on test set" to_port="unlabelled data"/>
          <connect from_op="Apply SVM2 on test set" from_port="labelled data" to_op="Test Results" to_port="labelled data"/>
          <connect from_op="Apply SVM2 on test set" from_port="model" to_port="result 4"/>
          <connect from_op="Test Results" from_port="performance" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="18"/>
          <portSpacing port="sink_result 3" spacing="72"/>
          <portSpacing port="sink_result 4" spacing="54"/>
          <portSpacing port="sink_result 5" spacing="108"/>
        </process>
      </operator>
    </process>

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Pop,
    I might be wrong but I don't think it is due to the fact that SVM2 has more data because if you look at the model as in the code below you see that SVM1 model and SVM2 model are rigorously the same.
    Not true. If we run your code we can mouse over the input port of SVM and see that it has 187 input examples, meaning that on each run 21are left out for testing against, whereas if we do the same on SVM2 there are 208 examples, meaning 0 are left out. So SVM2 gets to see the whole dataset at once, while the other never does, it just gets repeated glimpses of 90%.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    I think my mistake come from a confusion between neural net training and SVM. Don't we use the error on the test set to stop training the neural net?
    In the case of SVM the test set is only used to make a performance measurement, no feed back to the learning process, is this correct?
    In the case of NN the X-validation has an impact on the model performance, for SVM it has no impact on the model performance but on the quality of the model performance measurement. Is that right?
    three times right. Just let me add that the cross validation of RapidMiner never feeds the performance information back to any learner, independent of the type (NN vs. SVM vs. ...). The backprogation of the error is done internally in the neural net, the "outer" cross validation is only used for performance estimation in any case.

    I just have realized (should have read the thread title better  :P ) that you have the cross validation inside of a grid optimization. As far as I understood I would agree with Captain Haddocks explanation. I am currently on my way back home and do not have access to RapidMiner but maybe will try the process tomorrow.

    Cheers,
    Ingo
  • Options
    poppop Member Posts: 21 Maven
    Hi Ingo,

    Thank you very much for your further explanations. It makes things clear.

    Hi Haddock,

    I now realise my mistake, I said something wrong: "SVM2 is copy of SVM1", SVM2 only use the same parameters C & Gamma but is re-trained on the full dataset. So I now understand your point. What is confusing me still is that the two models are exactly the same when we visualize them... I don't understand this.

    If we now remove SVM2 and apply directly the model out of the Optimiser to the whole dataset we should now match the first result... Shouldn't we??
    There must be something very obvious that I miss, I am sorry to bother you with this.
    Again Haddock and Ingo, thank you very much for your very kind help.

    Here is the process with one SVM:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="error"/>
        <parameter key="encoding" value="GB2312"/>
        <process expanded="true" height="431" width="909">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
          <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="514" y="30">
            <list key="parameters">
              <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
              <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
            </list>
            <process expanded="true" height="385" width="909">
              <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
                <process expanded="true" height="510" width="470">
                  <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="130" y="30">
                    <parameter key="gamma" value="100000"/>
                    <parameter key="C" value="100000"/>
                    <list key="class_weights"/>
                    <parameter key="shrinking" value="false"/>
                  </operator>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true" height="510" width="470">
                  <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                    <parameter key="create_view" value="true"/>
                  </operator>
                  <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                    <list key="class_weights"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="result 1"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="648" y="210">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Test Results" width="90" x="782" y="210">
            <list key="class_weights"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="result 1" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Test Results" to_port="labelled data"/>
          <connect from_op="Test Results" from_port="performance" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="162"/>
          <portSpacing port="sink_result 3" spacing="144"/>
        </process>
      </operator>
    </process>

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Pop,

    When in doubt stick in a log! If I stick one in your code I can quickly see that the model you are applying is simply the one made in the last pass of the optimisation, which in this case had the largest values for C and Gamma. Big Gammas matter..
    If you are checking your results on the training data, it's natural that the results are better when you give a big value of gamma because you allow the training procedure to adapt almost perfectly to the data you give it. This will probably mean that, for unseen data samples you're model/classifier won't have good results since it's too adapted to the training samples.
    Which is from here...
    http://agbs.kyb.tuebingen.mpg.de/km/bb/showthread.php?tid=1240

    Here's the code which shows that the optimiser produced parameters of SVM.C = 1000 SVM.gamma = 0.01 whereas the model you were applying had 100,000 for both....

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="error"/>
        <parameter key="encoding" value="GB2312"/>
        <process expanded="true" height="367" width="902">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="45" y="165"/>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="246" y="30"/>
          <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="447" y="30">
            <list key="parameters">
              <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
              <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
                <process expanded="true">
                  <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="130" y="30">
                    <parameter key="gamma" value="100000"/>
                    <parameter key="C" value="100000"/>
                    <list key="class_weights"/>
                    <parameter key="shrinking" value="false"/>
                  </operator>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                    <parameter key="create_view" value="true"/>
                  </operator>
                  <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                    <list key="class_weights"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="result 1"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="210">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Test Results" width="90" x="648" y="210">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="log" expanded="true" height="76" name="Log" width="90" x="782" y="255">
            <list key="log">
              <parameter key="C" value="operator.SVM.parameter.C"/>
              <parameter key="G" value="operator.SVM.parameter.gamma"/>
            </list>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Optimise C and Gamma on treaining set" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_port="result 3"/>
          <connect from_op="Optimise C and Gamma on treaining set" from_port="result 1" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Test Results" to_port="labelled data"/>
          <connect from_op="Test Results" from_port="performance" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>






  • Options
    poppop Member Posts: 21 Maven
    Hi Haddock,

    First, thank you very much for not giving up!!!
    You are absolutely right about C and Gamma, it was my mistake, but even with fixed C and Gamma it doesn't solve my problem.

    I think my problem lies in a wrong interpretation of the X-Validation operator to illustrate this let me just give you 2 small process.
    In the first we still use the sonar data, I split it and use 70% to train the SVM, I test the model on the 30% remaining. You can see that I get an accuracy close to 84%. Then I apply this model on the whole data set and I get something around 95%. This make perfect sense to me.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="510" width="997">
          <operator activated="true" class="subprocess" expanded="true" height="94" name="Subprocess" width="90" x="45" y="30">
            <process expanded="true" height="510" width="1015">
              <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
                <parameter key="repository_entry" value="//Samples/data/Sonar"/>
              </operator>
              <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
              <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
              <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
              <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_port="out 1"/>
              <connect from_op="Multiply" from_port="output 2" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="split_data" expanded="true" height="94" name="Split Data" width="90" x="246" y="30">
            <enumeration key="partitions">
              <parameter key="ratio" value="0.7"/>
              <parameter key="ratio" value="0.3"/>
            </enumeration>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="447" y="30">
            <parameter key="gamma" value="0.01"/>
            <parameter key="C" value="1000.0"/>
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="648" y="30">
            <parameter key="model_file" value="C:\Repository\SVMwithSplit.mod"/>
          </operator>
          <operator activated="true" class="read_model" expanded="true" height="60" name="Read Model" width="90" x="246" y="165">
            <parameter key="model_file" value="C:\Repository\SVMwithSplit.mod"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="648" y="165">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="read_model" expanded="true" height="60" name="Read Model (2)" width="90" x="246" y="300">
            <parameter key="model_file" value="C:\Repository\SVMwithSplit.mod"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance (2)" width="90" x="648" y="300">
            <list key="class_weights"/>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Split Data" to_port="example set"/>
          <connect from_op="Subprocess" from_port="out 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Split Data" from_port="partition 1" to_op="SVM" to_port="training set"/>
          <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="SVM" from_port="model" to_op="Write Model" to_port="input"/>
          <connect from_op="Read Model" from_port="output" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <connect from_op="Read Model (2)" from_port="output" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="126"/>
          <portSpacing port="sink_result 2" spacing="126"/>
          <portSpacing port="sink_result 3" spacing="36"/>
        </process>
      </operator>
    </process>
    Now I replace the split by a X-Validation. The SVM use the same parameters as in the previous example. The performance is around  86%. Then I apply again the model on the whole dataset and I get 100% accuracy. Not a single misclassification. This make no sense to me I would have expected something similar or a bit better than in our first experiment but not 100%...

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="510" width="997">
          <operator activated="true" class="subprocess" expanded="true" height="94" name="Subprocess" width="90" x="45" y="30">
            <process expanded="true" height="510" width="1015">
              <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
                <parameter key="repository_entry" value="//Samples/data/Sonar"/>
              </operator>
              <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
              <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
              <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
              <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_port="out 1"/>
              <connect from_op="Multiply" from_port="output 2" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="447" y="30">
            <process expanded="true" height="510" width="482">
              <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="179" y="30">
                <parameter key="gamma" value="0.01"/>
                <parameter key="C" value="1000.0"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="training" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="510" width="482">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="60" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="313" y="30">
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="648" y="30">
            <parameter key="model_file" value="C:\Repository\SVMwithXValidation.mod"/>
          </operator>
          <operator activated="true" class="read_model" expanded="true" height="60" name="Read Model (2)" width="90" x="179" y="300">
            <parameter key="model_file" value="C:\Repository\SVMwithXValidation.mod"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance (2)" width="90" x="648" y="300">
            <list key="class_weights"/>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Validation" to_port="training"/>
          <connect from_op="Subprocess" from_port="out 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Validation" from_port="model" to_op="Write Model" to_port="input"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <connect from_op="Read Model (2)" from_port="output" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="36"/>
          <portSpacing port="sink_result 2" spacing="216"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Thank you very much for your help.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    in your second process, you use the model learned from the complete set and apply it also on the complete set. Learning with an RBF kernel with an appropriate gamma leads to perfect overfitting in such a setting. Just replace the SVM with a KNN with k = 1 and you will see exact the same phenomenon. For exactly that reason you never, never, never calculate and use the training error for anything (but wondering...)  ;)

    Here is the KNN process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="510" width="997">
          <operator activated="true" class="subprocess" expanded="true" height="94" name="Subprocess" width="90" x="112" y="30">
            <process expanded="true" height="510" width="1015">
              <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
                <parameter key="repository_entry" value="//Samples/data/Sonar"/>
              </operator>
              <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
              <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
              <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
              <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_port="out 1"/>
              <connect from_op="Multiply" from_port="output 2" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="447" y="30">
            <process expanded="true" height="510" width="482">
              <operator activated="true" class="k_nn" expanded="true" height="76" name="k-NN" width="90" x="179" y="30"/>
              <connect from_port="training" to_op="k-NN" to_port="training set"/>
              <connect from_op="k-NN" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="510" width="482">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="60" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="313" y="30">
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="648" y="30">
            <parameter key="model_file" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\SVMwithXValidation.mod"/>
          </operator>
          <operator activated="true" class="read_model" expanded="true" height="60" name="Read Model (2)" width="90" x="179" y="300">
            <parameter key="model_file" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\SVMwithXValidation.mod"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance (2)" width="90" x="648" y="300">
            <list key="class_weights"/>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Validation" to_port="training"/>
          <connect from_op="Subprocess" from_port="out 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Validation" from_port="model" to_op="Write Model" to_port="input"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <connect from_op="Read Model (2)" from_port="output" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="36"/>
          <portSpacing port="sink_result 2" spacing="216"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Change k to another value (e.g. 10) and you will see how the performances change (and it still remains to be "overfitted"). The same would be true if you change C and gamma in SVM (although it seems to be harder to understand there...).

    Cheers,
    Ingo
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Pop!

    We'll get there in the end, wherever there might be! I think you get the idea of validation, shuffle and test, shuffle and test until you reach some arbitrary satisfaction point. What you are hoping to get is some idea of how the trained model will perform on new examples.

    Because the maths is so slick at reshaping the data, and the permutation possibilties are almost limitless, Support Vector Machines can mimic nearly any binary pattern almost perfectly. So the problem turns around, into how to prevent overtraining, to which of course the answer is .... shuffle and test, shuffle and test etc.. etc..

    I've bashed out this code which goes through all the SVM combos on all the data, and records performance. What pops out is that getting 100% is not unusual, and that performance is surprisingly predictable.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
         <location/>
         <location/>
         <location/>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <parameter key="logverbosity" value="error"/>
       <parameter key="encoding" value="GB2312"/>
       <process expanded="true" height="618" width="884">
         <operator activated="true" class="optimize_parameters_grid" expanded="true" height="112" name="Optimise C and Gamma on treaining set" width="90" x="447" y="30">
           <list key="parameters">
             <parameter key="SVM.C" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
             <parameter key="SVM.gamma" value="0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000,100000"/>
           </list>
           <process expanded="true" height="404" width="835">
             <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
               <parameter key="repository_entry" value="//Samples/data/Sonar"/>
             </operator>
             <operator activated="true" class="normalize" expanded="true" height="94" name="Normalize" width="90" x="45" y="165"/>
             <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="179" y="165"/>
             <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="313" y="67">
               <parameter key="gamma" value="100000"/>
               <parameter key="C" value="100000"/>
               <list key="class_weights"/>
               <parameter key="shrinking" value="false"/>
             </operator>
             <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (3)" width="90" x="447" y="255">
               <list key="application_parameters"/>
               <parameter key="create_view" value="true"/>
             </operator>
             <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance (2)" width="90" x="581" y="210">
               <list key="class_weights"/>
             </operator>
             <operator activated="true" class="log" expanded="true" height="94" name="Log" width="90" x="715" y="210">
               <list key="log">
                 <parameter key="Perf" value="operator.Performance (2).value.accuracy"/>
                 <parameter key="C" value="operator.SVM.parameter.C"/>
                 <parameter key="G" value="operator.SVM.parameter.gamma"/>
               </list>
             </operator>
             <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
             <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
             <connect from_op="Multiply" from_port="output 1" to_op="SVM" to_port="training set"/>
             <connect from_op="Multiply" from_port="output 2" to_op="Apply Model (3)" to_port="unlabelled data"/>
             <connect from_op="SVM" from_port="model" to_op="Apply Model (3)" to_port="model"/>
             <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
             <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 1"/>
             <connect from_op="Log" from_port="through 1" to_port="performance"/>
             <connect from_op="Log" from_port="through 2" to_port="result 1"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="log_to_data" expanded="true" height="76" name="Log to Data" width="90" x="179" y="255"/>
         <operator activated="true" class="discretize_by_bins" expanded="true" height="94" name="Discretize" width="90" x="313" y="255">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="Perf"/>
           <parameter key="number_of_bins" value="5"/>
           <parameter key="min_value" value="0.97"/>
           <parameter key="max_value" value="1.0"/>
           <parameter key="range_name_type" value="interval"/>
         </operator>
         <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="447" y="255">
           <parameter key="name" value="Perf"/>
           <parameter key="target_role" value="label"/>
         </operator>
         <operator activated="true" class="decision_tree" expanded="true" height="76" name="Decision Tree" width="90" x="648" y="255"/>
         <connect from_op="Optimise C and Gamma on treaining set" from_port="performance" to_port="result 1"/>
         <connect from_op="Optimise C and Gamma on treaining set" from_port="parameter" to_port="result 2"/>
         <connect from_op="Log to Data" from_port="exampleSet" to_op="Discretize" to_port="example set input"/>
         <connect from_op="Discretize" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
         <connect from_op="Decision Tree" from_port="model" to_port="result 3"/>
         <connect from_op="Decision Tree" from_port="exampleSet" to_port="result 4"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
         <portSpacing port="sink_result 4" spacing="0"/>
         <portSpacing port="sink_result 5" spacing="0"/>
       </process>
     </operator>
    </process>
    Ahoy there Ingo  ;D
  • Options
    poppop Member Posts: 21 Maven
    Dear Ingo, dear Haddock,
    Thank you so much for sharing your knowledge and for taking time to help and educate people. I know that I am repeating myself but I truly think that we'll never say it enough.
    I think that, thanks to your explanations,  I have finally realized what were my misunderstandings on the X-Validation process. I just want to give a brief explanation for people that might get trapped in wrong interpretations as I was. The X-Validation Operator with k validations will divide the dataset in k subset. It will then use k-1 subset for training the model and the one not used for testing it. It will do the process k time changing each time the testing set. The model is then never trained and tested on the same data. On each of the k time The Operator will train a model and make a performance measurement. The performance that is the output of the operator is the average of the k performance measurement. The model that is the output of the operator is none of the k model trained on a subset but is another model trained on the whole dataset (that is the main point I did not get). So obviously applying this model to the whole dataset afterward makes no sense.
    A question remains. As Ingo showed the parameters chosen for C and Gamma lead to over-fitting. Intuitively I would have expected the X-Validation Operator to give me a 100% accuracy in that case and to me a 100% accuracy would have been the warning of over-fitting. I now realize that this is wrong, but then what is a good indicator for over-fitting? And in our case is over-fitting an issue? Even with this over-fitting, the performance on the X-Validation are good, which means the model kept its ability to generalize. Is this wrong??
    Thank you very much.
    Pop
Sign In or Register to comment.