[Solved] Use the same attribute for creating a label and the prediction?

qwertzqwertz Member Posts: 130 Contributor II
edited November 2018 in Help
Good evening,

This is hopefully a question that can easily be answered by the experienced experts - and probably a stupid one - but I am rather new in this topic and found pros and cons as well:

Let's assume that there are 5 different attributes representing stock prices. The "windowing" operator creates a label from att1. Afterwords a learner shall build a model from that (not shown in the attached sample process). Depending on whether the role of att1 is either "regular" or "label" it will appear in the data set provided to the learner or not.

I was wondering whether it is usefull to keep att1 for the learner to predict att1(t+1). On the one hand I can imagine that att1 can contribute to the model. On the other hand I see the risk that att1 gets too much weight in the model as it's correlation to the label (which is based on att1) is quite strong of course.

The attached code is a sample process to this question. By changing "target role" parameter in the "set role" operator you can either include or exclude att1.

Please let me know what your experience is...

Best regards
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="383" width="681">
     <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
       <parameter key="number_examples" value="20"/>
       <parameter key="attributes_lower_bound" value="0.0"/>
     <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="label"/>
       <parameter key="invert_selection" value="true"/>
       <parameter key="include_special_attributes" value="true"/>
     <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="313" y="75">
       <parameter key="name" value="att1"/>
       <list key="set_additional_roles"/>
     <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="447" y="30">
       <parameter key="horizon" value="1"/>
       <parameter key="window_size" value="1"/>
       <parameter key="create_label" value="true"/>
       <parameter key="label_attribute" value="att1"/>
     <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
     <connect from_op="Set Role" from_port="example set output" to_op="Windowing" to_port="example set input"/>
     <connect from_op="Windowing" from_port="example set output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>


  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi, as long as you use data of att1 from the past to predict values of the future you can safely keep att1 in the training data. This is often even necessary, namely if only *have* one attribute. Imagine e.g. sales prediction based on past sales, or stock market analysis based on the development of stocks in the past.

    Best regards,
  • Options
    qwertzqwertz Member Posts: 130 Contributor II

    Thanks a lot. Now I feel much better with my prediction values :)

    Kind regards
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Nevertheless have a close look at your predictions and your model - sometimes the model gives a *very* high weight to the last value of the target attribute, leading to a prediction curve identical to the original one, just shifted by a few values...

Sign In or Register to comment.