[Solved] How to read out multiple performance parameters within optimization

qwertzqwertz Member Posts: 130 Contributor II
edited November 2018 in Help

Hi,

In the attached process I have two different performance operators which run inside an "optimization" operator.
Now I am looking for a way to get the result of the whole process (both performance values of the optimized/best model) into an example set. This seems a little tricky as a kind of collection is returned but I cannot use the "append operator" as the type is "per" and not "exa". (Nor is it possible to log within the optimization operator as this would only return the last run of the SVM but not the optimized/best.)

Would appreciate any ideas...


Best regards
Sachs

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
   <process expanded="true" height="314" width="413">
     <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
     <operator activated="true" class="optimize_parameters_grid" compatibility="5.3.000" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="179" y="30">
       <list key="parameters">
         <parameter key="SVM (Linear).C" value="1,100"/>
       </list>
       <process expanded="true" height="388" width="711">
         <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
           <parameter key="training_window_width" value="10"/>
           <parameter key="test_window_width" value="10"/>
           <process expanded="true" height="388" width="330">
             <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.000" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30">
               <parameter key="C" value="100"/>
             </operator>
             <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
             <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true" height="388" width="480">
             <operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="series:forecasting_performance" compatibility="5.3.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
               <parameter key="horizon" value="1"/>
             </operator>
             <operator activated="true" class="performance_regression" compatibility="5.3.000" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="30">
               <parameter key="root_mean_squared_error" value="false"/>
               <parameter key="absolute_error" value="true"/>
             </operator>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_op="Performance (2)" to_port="performance"/>
             <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
             <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <connect from_port="input 1" to_op="Validation" to_port="training"/>
         <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="source_input 2" spacing="0"/>
         <portSpacing port="sink_performance" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Generate Data" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
     <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Sachs,

    just log performance1 and performance2 of the Validation operator. Please have a look at the attached process.

    The results of both performance vectors (the output of a Performance operator) are concatenated. In the final results view you can see the performance: the first entry is the prediction_trend_accuracy, the second one the absolute_error. If you had a third measure, that would be performance3 of the validation.

    By logging the performanceX values of the validation, you actually get the performance of the complete validation, not just of the last fold.

    If anything is unclear, please let us know.
    Best regards,

    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.006">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.006" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="generate_data" compatibility="5.3.006" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
         <operator activated="true" class="optimize_parameters_grid" compatibility="5.3.006" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="179" y="30">
           <list key="parameters">
             <parameter key="SVM (Linear).C" value="1,100"/>
           </list>
           <process expanded="true">
             <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.001" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
               <parameter key="training_window_width" value="10"/>
               <parameter key="test_window_width" value="10"/>
               <process expanded="true">
                 <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.006" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30">
                   <parameter key="C" value="100"/>
                 </operator>
                 <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
                 <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
                 <portSpacing port="source_training" spacing="0"/>
                 <portSpacing port="sink_model" spacing="0"/>
                 <portSpacing port="sink_through 1" spacing="0"/>
               </process>
               <process expanded="true">
                 <operator activated="true" class="apply_model" compatibility="5.3.006" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                   <list key="application_parameters"/>
                 </operator>
                 <operator activated="true" class="series:forecasting_performance" compatibility="5.2.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                   <parameter key="horizon" value="1"/>
                 </operator>
                 <operator activated="true" class="performance_regression" compatibility="5.3.006" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="30">
                   <parameter key="root_mean_squared_error" value="false"/>
                   <parameter key="absolute_error" value="true"/>
                 </operator>
                 <connect from_port="model" to_op="Apply Model" to_port="model"/>
                 <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                 <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                 <connect from_op="Performance" from_port="performance" to_op="Performance (2)" to_port="performance"/>
                 <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
                 <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
                 <portSpacing port="source_model" spacing="0"/>
                 <portSpacing port="source_test set" spacing="0"/>
                 <portSpacing port="source_through 1" spacing="0"/>
                 <portSpacing port="sink_averagable 1" spacing="0"/>
                 <portSpacing port="sink_averagable 2" spacing="0"/>
               </process>
             </operator>
             <operator activated="true" class="log" compatibility="5.3.006" expanded="true" height="76" name="Log" width="90" x="246" y="30">
               <list key="log">
                 <parameter key="prediction_trend_acc" value="operator.Validation.value.performance1"/>
                 <parameter key="absolute_error" value="operator.Validation.value.performance2"/>
               </list>
             </operator>
             <connect from_port="input 1" to_op="Validation" to_port="training"/>
             <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
             <connect from_op="Log" from_port="through 1" to_port="performance"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Generate Data" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
         <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • qwertzqwertz Member Posts: 130 Contributor II

    Hi Marius,

    Probably my description was not ideal. When I take your process I receive two rows in the log table according the to the two optimization values set (in this example C=1 and C=100). The model resultung of the "optimization" parameter will be build on the the "C" value with the better performance. But there are two different performance criteria and "prediction trend" is better when C=1 while "absolute_error" is better when C=100.

    So which C value was taken for the final model and what is the performance of this final model?


    Thank you very much & kind regards
    Sachs


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="5.3.005" expanded="true" height="60" name="Generate Data" width="90" x="45" y="120"/>
          <operator activated="true" class="multiply" compatibility="5.3.005" expanded="true" height="94" name="Multiply" width="90" x="179" y="120"/>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.3.005" expanded="true" height="112" name="Optimize Parameters (Grid)" width="90" x="313" y="30">
            <list key="parameters">
              <parameter key="SVM (Linear).C" value="1,100"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
                <parameter key="training_window_width" value="10"/>
                <parameter key="test_window_width" value="10"/>
                <process expanded="true">
                  <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.005" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30">
                    <parameter key="C" value="100"/>
                  </operator>
                  <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
                  <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="series:forecasting_performance" compatibility="5.3.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                    <parameter key="horizon" value="1"/>
                  </operator>
                  <operator activated="true" class="performance_regression" compatibility="5.3.005" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="30">
                    <parameter key="root_mean_squared_error" value="false"/>
                    <parameter key="absolute_error" value="true"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_op="Performance (2)" to_port="performance"/>
                  <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
                  <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="log" compatibility="5.3.005" expanded="true" height="76" name="Log" width="90" x="246" y="75">
                <list key="log">
                  <parameter key="prediction_trend_acc" value="operator.Validation.value.performance1"/>
                  <parameter key="absolute_error" value="operator.Validation.value.performance2"/>
                </list>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="result 1"/>
              <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="165">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="108"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    The performance used by the optimization operator is the one returned as "performance" value by the X-Validation. That is usually the first one in the performance vector, i.e. the top-most performance in the PerformanceVector Result tab. If you use only a single performance operator, you can select which of the generated performance values should be used by defining it as the main value in the corresponding parameter of the Performance operator.

    If you simply want to know the parameter combination which resulted in the best performance, you should connect the "par" output of the optimization operator to the process output. You can even store that output in the repository with Store and apply it to another learning operator with the Set Parameters operator. That prevents you from having to type all those potentially long numbers when optimizing more than one parameter.

    Best regards,
    Marius
  • qwertzqwertz Member Posts: 130 Contributor II

    Forgive me my stupidness: It is probably easy to solve but I don't see how to do it... I tried to use the store operator but it throughs this error:

    Cannot store data in repository at entry "//LocalRepository/store". Reason: Cannot store data at entry "C:\users\USERNAME\Documents\Rapidminer\store.ioo": java.io.NotSerializableException: com.rapidminer.operator.performance.ForecastingPerformanceEvaluator.


    By the way would it also be possible to log the "par" output? That would be quite handy for me. Even better than using the store operator.


    Thanks again for your support!
    Sachs


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="5.3.005" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.3.005" expanded="true" height="112" name="Optimize Parameters (Grid)" width="90" x="179" y="30">
            <list key="parameters">
              <parameter key="SVM (Linear).C" value="1,100"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
                <parameter key="training_window_width" value="10"/>
                <parameter key="test_window_width" value="10"/>
                <process expanded="true">
                  <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.005" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30">
                    <parameter key="C" value="100"/>
                  </operator>
                  <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
                  <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="series:forecasting_performance" compatibility="5.3.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                    <parameter key="horizon" value="1"/>
                  </operator>
                  <operator activated="true" class="performance_regression" compatibility="5.3.005" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="30">
                    <parameter key="root_mean_squared_error" value="false"/>
                    <parameter key="absolute_error" value="true"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_op="Performance (2)" to_port="performance"/>
                  <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
                  <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="model" to_port="result 1"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="store" compatibility="5.3.005" expanded="true" height="60" name="Store" width="90" x="313" y="75">
            <parameter key="repository_entry" value="//LocalRepository/store"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hm, you just discovered another bug. The parameter object contains the performance vector, and it seems that the values created by the Forecasting Performance operator cannot be stored yet. I created an internal ticket requesting to implement to corresponding methods.

    Logging of the parameters object is not possible out of the box, but maybe you can convert them to an example set with the help of the Execute Script operator. For assistance of using that operator please download the document How to Extend RapidMiner from our website.

    Best regards,
    Marius
  • qwertzqwertz Member Posts: 130 Contributor II

    I had a brief look at the document "How to extend Rapidminer 5" which looks really promising. So I will take some time to get through it... let's see :)

    Best regards
    Sachs
Sign In or Register to comment.