"[Solved] Performance measurement with trend direction"

qwertzqwertz Member Posts: 130 Contributor II
edited June 2019 in Help
Dear all,

I am looking for a certain kind of performance measurement. "Relative error" for example gives an idea to which degree the prediction fits the label. But in my special case I also want to know whether the prediction over- or underestimates the label.

A workaround might be to use something like "average(prediction) - average(label)" in addition to the relative error. But of course it would be better to have this in one operator.

Please let me know your ideas...


Kind regards
Sachs
Tagged:

Answers

  • qwertzqwertz Member Posts: 130 Contributor II

    It seems that I can have multiple performance criteria if I use the attached setup. But in this case I have a general question of understanding:

    1) Which of the both performance operators is being used to train the model? (Or can it be multiple?)
    I thought that validation works like: take performance to adapt SVM > apply model > evaluate performance > back to first step

    2) Is there a way to log the standard deviation of the performance measure which is shown in the result view as well?


    Kind regards
    Sachs


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="459" width="694">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="10"/>
            <parameter key="number_of_attributes" value="2"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="5.2.008" expanded="true" height="130" name="Validation" width="90" x="179" y="30">
            <process expanded="true" height="459" width="165">
              <operator activated="true" class="support_vector_machine_linear" compatibility="5.2.008" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
              <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
              <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="459" width="624">
              <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="120">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="179" y="120"/>
              <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="255">
                <parameter key="root_mean_squared_error" value="false"/>
                <parameter key="relative_error" value="true"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="447" y="165">
                <parameter key="filename" value="log"/>
                <list key="log">
                  <parameter key="log" value="operator.Performance (2).value.relative_error"/>
                </list>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="313" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="averagable 2"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
              <portSpacing port="sink_averagable 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <connect from_op="Validation" from_port="averagable 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="36"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Sachs,

    the training of the model is completely independent of the chosen performance measure - the algorithm (in your case, the SVM) always uses the same methods to create the model, and the performance operators are only used to estimate the result of those methods. A detailed description of the Cross Validation can be found here: http://en.wikipedia.org/wiki/Cross-validation_(statistics)#K-fold_cross-validation

    Furthermore, you are currently logging the performance of each iteration of the X-Validation. Usually, you do not want to do that, but are only interested in the performance of the entire X-Validation. For that, you have to place the Log  operator outside of the X-Validation. Then you can log the final performance by logging the "performance" value of the Validation operator. The standard deviation is available as the "deviation" value of the same operator.


    You can easily create custom performance measures: you can perform arbitrary operations on the output of Apply Model, e.g. with Aggregate and Generate Attributes, and then use the Extract Performance operator to provide a value of the resulting example set as performanace value.

    Best regards,
    Marius
  • qwertzqwertz Member Posts: 130 Contributor II

    Hi Marius,

    Thank you for all! This helps a lot!


    Kind regards
    Sachs


    PS: Here is my humble contribution to this topic. I set up a sample process like described above which does an individual performance calculation. For anyone who might be in need of it...

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="403" width="701">
          <operator activated="true" class="subprocess" compatibility="5.2.008" expanded="true" height="76" name="Generate Data (2)" width="90" x="45" y="30">
            <process expanded="true" height="403" width="694">
              <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
                <parameter key="number_examples" value="10"/>
                <parameter key="number_of_attributes" value="3"/>
                <parameter key="attributes_lower_bound" value="0.0"/>
              </operator>
              <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename" width="90" x="179" y="30">
                <parameter key="old_name" value="att3"/>
                <parameter key="new_name" value="prediction"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="313" y="30">
                <parameter key="name" value="prediction"/>
                <parameter key="target_role" value="prediction"/>
                <list key="set_additional_roles"/>
              </operator>
              <connect from_op="Generate Data" from_port="output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.2.008" expanded="true" height="76" name="Aggregate" width="90" x="179" y="30">
            <list key="aggregation_attributes">
              <parameter key="label" value="average"/>
              <parameter key="prediction" value="average"/>
            </list>
          </operator>
          <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename (2)" width="90" x="313" y="30">
            <parameter key="old_name" value="average(label)"/>
            <parameter key="new_name" value="avg_label"/>
            <list key="rename_additional_attributes">
              <parameter key="average(prediction)" value="avg_prediction"/>
            </list>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
            <list key="function_descriptions">
              <parameter key="performance_att" value="avg_prediction-avg_label"/>
            </list>
          </operator>
          <operator activated="true" class="extract_performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="581" y="30">
            <parameter key="performance_type" value="data_value"/>
            <parameter key="attribute_name" value="performance_att"/>
            <parameter key="example_index" value="1"/>
          </operator>
          <connect from_op="Generate Data (2)" from_port="out 1" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance" to_port="example set"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • qwertzqwertz Member Posts: 130 Contributor II

    I was just fooling around when I came across this:

    To my understanding it should be possible to extract the results of both performance operators. However, I always get just the same value twice...


    Best regards
    Sachs

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="459" width="694">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="2"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
          </operator>
          <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.001" expanded="true" height="130" name="Validation" width="90" x="179" y="30">
            <parameter key="training_window_width" value="10"/>
            <parameter key="test_window_width" value="10"/>
            <process expanded="true" height="432" width="335">
              <operator activated="true" class="support_vector_machine_linear" compatibility="5.2.008" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
              <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
              <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="432" width="500">
              <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="series:forecasting_performance" compatibility="5.2.001" expanded="true" height="76" name="Performance" width="90" x="179" y="75">
                <parameter key="horizon" value="1"/>
                <parameter key="main_criterion" value="prediction_trend_accuracy"/>
              </operator>
              <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="120">
                <parameter key="root_mean_squared_error" value="false"/>
                <parameter key="relative_error" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 2"/>
              <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
              <portSpacing port="sink_averagable 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="313" y="75">
            <list key="log">
              <parameter key="performance_a" value="operator.Validation.value.performance"/>
              <parameter key="performance_b" value="operator.Validation.value.performance1"/>
            </list>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="36"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Actually, you can only log the first ave-output of the Validation. Furthermore, for some reason the performance and performance1 values are the same. You have to change your process in the following way:
    - Connect the per output of the first performance operator to the per input of the second performance output
    - connect the second per output to the first ave output of the validation
    - log performance and performance2 instead of performance and performance1

    Best regards,
    Marius
  • qwertzqwertz Member Posts: 130 Contributor II

    Though I don't understand the underlying logic, it works pretty well the way you described it :)

    Thank you!


    Cheers
    Sachs

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="459" width="694">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="2"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
          </operator>
          <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.001" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
            <parameter key="training_window_width" value="10"/>
            <parameter key="test_window_width" value="10"/>
            <process expanded="true" height="432" width="335">
              <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.008" expanded="true" height="76" name="SVM" width="90" x="45" y="30">
                <parameter key="svm_type" value="epsilon-SVR"/>
                <parameter key="kernel_type" value="linear"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="training" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="432" width="480">
              <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="series:forecasting_performance" compatibility="5.2.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                <parameter key="horizon" value="1"/>
                <parameter key="main_criterion" value="prediction_trend_accuracy"/>
              </operator>
              <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="30">
                <parameter key="root_mean_squared_error" value="false"/>
                <parameter key="relative_error" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_op="Performance (2)" to_port="performance"/>
              <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="313" y="30">
            <list key="log">
              <parameter key="per" value="operator.Validation.value.performance"/>
              <parameter key="per2" value="operator.Validation.value.performance2"/>
            </list>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    The logic of passing one performance vector into another Performance operator simply adds the new measures to the input performance object :)
  • qwertzqwertz Member Posts: 130 Contributor II


    I got the part with passing a value into another operator. What makes me puzzeld is the part that performance and performance1 values are the same and that a performance value which is delivered to avg2 cannot be logged from performance 1 or 2.

    Cheers
    Sachs
Sign In or Register to comment.