[Solved] Learner with numerical input and binominal output?

qwertzqwertz Member Posts: 130 Contributor II
edited July 2019 in Help
Dear all,

I want to build a series prediction process but I am not quite sure which learner is the right one to use.

The input is a multivariate numerical data set (including a numerical label).
The output is supposed to be a simple binominal "up" or "down".

I wanted to ask whether there is a special learner or setting for this kind of task. If not I could imagine to use a SVM with numerical output followed by a "generate attribute" operator that includes a formula like
if(prediction[for t+1]>=label[of t] then "up" else "down")


Best regards
Sachs
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Sachs,

    you should install the Series Extension. Then you can window your data to get it into the right format (I suppose you already have done that?).
    To get up/down-labels, use the Differentiate operator with the correct change_mode.

    Best regards,
    Marius
  • qwertzqwertz Member Posts: 130 Contributor II

    Dear Marius,

    thank you very much for your fast response. I tried to set up the process according to your suggestion but I got stuck at some point:
    -I don't know how get the label converted to binominal type so that the SVM will accept it. So far I tried "nominal to binominal" operator but it doesn't work out yet.
    -Some of the operators seem to "skip" the role. However, there is a workaround using the set role operator.
     

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="series:differentiate_example_set" compatibility="5.3.000" expanded="true" height="76" name="Differentiate" width="90" x="179" y="30">
            <parameter key="attribute_name" value="label"/>
            <parameter key="change_mode" value="direction"/>
            <parameter key="keep_original_attribute" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="30">
            <parameter key="condition_class" value="no_missing_attributes"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
            <parameter key="attribute_name" value="change(label)"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Binominal" width="90" x="581" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="change(label)"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="transform_binominal" value="true"/>
          </operator>
          <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="715" y="30">
            <parameter key="training_window_width" value="60"/>
            <parameter key="training_window_step_size" value="1"/>
            <parameter key="test_window_width" value="1"/>
            <process expanded="true">
              <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.008" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
              <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
              <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="series:forecasting_performance" compatibility="5.3.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                <parameter key="horizon" value="1"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Differentiate" to_port="example set input"/>
          <connect from_op="Differentiate" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Kind regards
    Sachs
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Sachs,

    this specific process runs fine if you actually hit the Run button, even though the meta data processing creates some warnings. This is because Differentiate can create the label "no change" in theory, but this specific data set has no constant values.

    To overcome this problem in general where you have the "no change" class, you have several options. Applying Nominal to Binominal on the label however is not possible, since that would split up the label attribute into several binominal attributes, where the learners can only handle a single label attribute.
    So what you can do is the following.

    A)
    Since "no change" is probably rather rare, dismiss examples with no change (use Filter Examples)

    B)
    Generate three SVMs, that classify up/not-up, down/not-down and change/no-change, apply all three models and predict that outcome with the highest confidence



    I have created an internal ticket about the Differentiate operator dismissing the label role.

    Best regards,
    Marius
  • qwertzqwertz Member Posts: 130 Contributor II
     
    Dear Marius,
     
    now it seems to work fine. Besides the worries with the polynominal attributes I got confused by the performance operator. I used prediction trend accuracy in the past but it shows "unknown" in this set up. But after changing to the common performance operator everything is ok now.
     
    Thank you for your advise
    Sachs
     
     
     
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="5.3.008" expanded="true" height="76" name="Sub Generate Data" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
              <operator activated="true" class="series:differentiate_example_set" compatibility="5.3.000" expanded="true" height="76" name="Differentiate" width="90" x="179" y="30">
                <parameter key="attribute_name" value="label"/>
                <parameter key="change_mode" value="direction"/>
                <parameter key="keep_original_attribute" value="false"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples" width="90" x="315" y="30">
                <parameter key="condition_class" value="no_missing_attributes"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
                <parameter key="attribute_name" value="change(label)"/>
                <parameter key="target_role" value="label"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="581" y="30">
                <parameter key="old_name" value="change(label)"/>
                <parameter key="new_name" value="label"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <connect from_op="Generate Data" from_port="output" to_op="Differentiate" to_port="example set input"/>
              <connect from_op="Differentiate" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.3.008" expanded="true" height="94" name="Multiply" width="90" x="179" y="120"/>
          <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
            <parameter key="training_window_width" value="60"/>
            <parameter key="training_window_step_size" value="1"/>
            <parameter key="test_window_width" value="1"/>
            <process expanded="true">
              <operator activated="true" class="support_vector_machine_linear" compatibility="5.3.008" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
              <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
              <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (2)" width="90" x="179" y="30"/>
              <operator activated="false" class="series:forecasting_performance" compatibility="5.3.000" expanded="true" height="76" name="Performance" width="90" x="179" y="165">
                <parameter key="horizon" value="1"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="165">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Sub Generate Data" from_port="out 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="36"/>
          <portSpacing port="sink_result 2" spacing="90"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Oh yeah, I also noticed that. You have to increase the size of the test window width. It's currently set to 1, meaning that there is only one single example inside. Of course, that does not allow for a "trend" accuracy.
    In any case, since you are now dealing with a binominal classification problem and no longer a regression, the trend accuracy operator does not work anyway.

    Best regards,
    Marius
  • qwertzqwertz Member Posts: 130 Contributor II
    Dear Marius,

    thank you! How could I have overlooked that the test window needs to be at least two?!
    Now everything works fine!

    Cheers
    Sachs
  • qwertzqwertz Member Posts: 130 Contributor II
    PS: Wouldn't it be good to have a "set type" operator? Similar to "set role" but for types like "numerical" or "binominal", etc. and similar to the settings when importing data from an excel sheet for example.

    This operator would avoid error messages of the following operators and could check for integrity (e.g. if binominal is only two different values).


    Best
    Sachs
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Such operators exist, they are named after the transformation they perform, e.g. Nominal to Numerical, Numerical to Polynominal etc.

    However, we are thinking about doing (some of) these transformation automatically in the future.

    Best regards,
    Marius
Sign In or Register to comment.