How to forecast and improve model simultaneously

gp3354gp3354 Member Posts: 2 Contributor I
edited December 2018 in Help

Hello!

I’m new to Data Science and RM.  

I am asking for some help in the following task. I am building a model, that would forecast energy consumption for every day. I have a lot of training data and I have already prepared input parameters of one month of test data. Because test data are from past, I also have the exact energy consumption figures for the whole month. So, I would like to validate my model, based on this test data.

 

Is there any function in RapidMiner that would predict energy consumption for the first day of the month, then take the exact consumption figure from an additional file and use it as a training data and after that predict energy consumption for the second day of the month? Then again, take the exact consumption for second day, use it as a training data and predict consumption for day three of the month, and again, and again, for the whole month.

What I actually need is an algorithm that would predict, then learn from some extra information (not previously known) and train again, repeat this whole task again. 

 

I would appreciate some good advice, thank you in advance!

 

Tagged:

Best Answer

Answers

  • gp3354gp3354 Member Posts: 2 Contributor I

    Hello rfuentealba

    thank you for the elaborate answer, you're amazing. 

    I was hoping there might be a built-in function that would solve that problem recursively, but your answer was helpful anyway. 

     

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hi @gp3354!

     

    Glad it helped.

     

    A little while after I replied, I thought about something else that you should take in consideration. As I don't know what your data looks like, I'll make something up to explain my point.

     

    Let's say this is your data:

     

    Monday, 101kw

    Tuesday, 97kw

    Wednesday, 98kw

    Thursday, 94kw

    Friday, 104kw

    Saturday, 119kw

    Sunday, 93kw.

     

    Let's say you apply a decision tree (I don't care about the algorithm, so I chose this to make it easy), and that since it's Monday, the decision tree is confident that your consumption will be 101kw...

     

    If you put this as your new data, it's ok, but... what if on that Monday, your brother appeared at home with some beers to watch a soccer game, your neighbour asked you if she could use your laundry machine, and you used the coffee machine more than what was expected because you couldn't sleep? That would result in having more than the 101kw you predicted yet you are still reinforcing your algorithm with your prediction data instead of using your new data that may be different. Evaluate if what you want is to use the prediction or the outcome and fix appropriately, if you find it ok.

     

    Never forget this rule (I forget it more often than not): Machine Learning isn't about forecasting the future but about using data to drive your decision making, by creating a mathematical idea of what will happen if the behaviour you are studying continues. I guess you already know how to use the operators I sent you, these are enough to solve this minor inconvenient.

     

    All the best,

     

    Rodrigo.

  • pusercpuserc Member Posts: 6 Contributor I

    Thank you so much Mr. rfuentealba .

    It would be really helpful to the whole community if you share the xml version of that algorithm. I'll be grateful for your support.

     

    Best Regards.

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hello @puserc,

    Have you solved your problem? I haven't been in front of a computer in the past days, if you want I can send it tomorrow.

    Best regards,
  • pusercpuserc Member Posts: 6 Contributor I

    Hello everyone,

    No Mr rfuentealba , I couldn't create it.

    I would be grateful if you send it to me as you said .

    Thank you so much. 

    Best regards.

     

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hello @puserc

     

     

     

    Please find attached. There are three important processes:

     

    02 Predict contains just the executable prediction and works as follows:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="productivity:execute_process" compatibility="8.2.001" expanded="true" height="68" name="Generate Unlabeled" width="90" x="45" y="136">
    <parameter key="process_location" value="02-2 Generate Unlabeled Data"/>
    <list key="macros"/>
    </operator>
    <operator activated="true" class="productivity:execute_process" compatibility="8.2.001" expanded="true" height="82" name="Generate Prediction" width="90" x="45" y="34">
    <parameter key="process_location" value="02-1 Generate Prediction"/>
    <list key="macros"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="179" y="85">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="union" compatibility="8.2.001" expanded="true" height="82" name="Union" width="90" x="313" y="34"/>
    <operator activated="true" class="store" compatibility="8.2.001" expanded="true" height="68" name="Store" width="90" x="447" y="34">
    <parameter key="repository_entry" value="Consumption Training"/>
    </operator>
    <connect from_op="Generate Unlabeled" from_port="result 1" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Generate Prediction" from_port="result 1" to_op="Union" to_port="example set 1"/>
    <connect from_op="Generate Prediction" from_port="result 2" to_op="Apply Model" to_port="model"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Union" to_port="example set 2"/>
    <connect from_op="Union" from_port="union" to_op="Store" to_port="input"/>
    <connect from_op="Store" from_port="through" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    02-1 Generate Prediction helps updating historical information with recently scored information (a very rudimentary thing).

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve Consumption Training" width="90" x="112" y="391">
    <parameter key="repository_entry" value="Consumption Training"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Only Labeled" width="90" x="246" y="391">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Level.is_not_missing."/>
    </list>
    </operator>
    <operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve Consumption" width="90" x="45" y="85">
    <parameter key="repository_entry" value="Consumption"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Only with KwH" width="90" x="179" y="85">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="KwH.is_not_missing."/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">I want to learn if my last prediction was good or not</description>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="85">
    <list key="function_descriptions">
    <parameter key="Level" value="if([KwH]&lt;=35,&quot;Base&quot;,if([KwH]&lt;=55,&quot;Low&quot;,if([KwH]&lt;=75,&quot;Normal&quot;,if([KwH]&lt;=95,&quot;High&quot;,&quot;Too High&quot;))))"/>
    <parameter key="Day" value="date_str_custom(Date, &quot;E&quot;)"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Seasonality by day of the week and properly labeling</description>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="447" y="85">
    <parameter key="attribute_name" value="Level"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles">
    <parameter key="Date" value="id"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Properly labeling</description>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set ID on Training" width="90" x="380" y="391">
    <parameter key="attribute_name" value="Date"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="391"/>
    <operator activated="true" breakpoints="after" class="set_minus" compatibility="8.2.001" expanded="true" height="82" name="Set Minus" width="90" x="648" y="85"/>
    <operator activated="true" class="union" compatibility="8.2.001" expanded="true" height="82" name="Union" width="90" x="916" y="340"/>
    <operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="1050" y="340">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Level|Day|Date"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="103" name="Multiply (2)" width="90" x="1184" y="340"/>
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.001" expanded="true" height="103" name="Decision Tree" width="90" x="1452" y="85">
    <parameter key="maximal_depth" value="5"/>
    <parameter key="apply_pruning" value="false"/>
    <parameter key="apply_prepruning" value="false"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="1653" y="238">
    <list key="application_parameters"/>
    <description align="center" color="transparent" colored="false" width="126">I want prediction vs reality, don't I?</description>
    </operator>
    <operator activated="true" class="store" compatibility="8.2.001" expanded="true" height="68" name="Store" width="90" x="1787" y="85">
    <parameter key="repository_entry" value="Consumption Training"/>
    </operator>
    <connect from_op="Retrieve Consumption Training" from_port="output" to_op="Only Labeled" to_port="example set input"/>
    <connect from_op="Only Labeled" from_port="example set output" to_op="Set ID on Training" to_port="example set input"/>
    <connect from_op="Retrieve Consumption" from_port="output" to_op="Only with KwH" to_port="example set input"/>
    <connect from_op="Only with KwH" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Set Minus" to_port="example set input"/>
    <connect from_op="Set ID on Training" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Set Minus" to_port="subtrahend"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Union" to_port="example set 2"/>
    <connect from_op="Set Minus" from_port="example set output" to_op="Union" to_port="example set 1"/>
    <connect from_op="Union" from_port="union" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
    <connect from_op="Multiply (2)" from_port="output 1" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Multiply (2)" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Store" to_port="input"/>
    <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
    <connect from_op="Store" from_port="through" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="84"/>
    <portSpacing port="sink_result 2" spacing="42"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="317" resized="true" width="821" x="10" y="10">Integrating my data from &amp;quot;yesterday&amp;quot; to the scoring algorithm.</description>
    <description align="center" color="orange" colored="true" height="384" resized="true" width="820" x="11" y="333">This is my historical data. I use it creatively to filter the data from &amp;quot;yesterday&amp;quot; that I already have predicted and scored with the current values.</description>
    <description align="center" color="red" colored="true" height="495" resized="true" width="540" x="834" y="10">Mixing data from &amp;quot;yesterday&amp;quot; and from &amp;quot;history&amp;quot;.&lt;br/&gt;&lt;br/&gt;At this point, both data objects have the same structure, except for the prediction.</description>
    <description align="center" color="purple" colored="true" height="493" resized="true" width="577" x="1376" y="10">Predictive Algorithm and Storage: notice that I store data with the same structure.</description>
    </process>
    </operator>
    </process>

    The 02-2 Generate Unlabeled Data is just filters and negational queries. Everytime you execute your algorithm, your predictions for the future "improve".

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve Consumption Training" width="90" x="45" y="187">
    <parameter key="repository_entry" value="Consumption Training"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="179" y="187">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Level.is_in.Base;High;Low;Normal;Too High"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role (2)" width="90" x="313" y="187">
    <parameter key="attribute_name" value="Date"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve Consumption" width="90" x="45" y="85">
    <parameter key="repository_entry" value="Consumption"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="85">
    <parameter key="attribute_name" value="Date"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="set_minus" compatibility="8.2.001" expanded="true" height="82" name="Set Minus" width="90" x="447" y="136"/>
    <operator activated="true" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="136">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="KwH.is_missing."/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="715" y="136">
    <list key="function_descriptions">
    <parameter key="Day" value="date_str_custom(Date, &quot;E&quot;)"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="849" y="136">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Day|Date"/>
    </operator>
    <connect from_op="Retrieve Consumption Training" from_port="output" to_op="Filter Examples (2)" to_port="example set input"/>
    <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Set Minus" to_port="subtrahend"/>
    <connect from_op="Retrieve Consumption" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Set Minus" to_port="example set input"/>
    <connect from_op="Set Minus" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="84"/>
    <portSpacing port="sink_result 2" spacing="21"/>
    <description align="center" color="yellow" colored="false" height="297" resized="true" width="662" x="24" y="36">Get only the data that hasn't already been scored by feature generation.</description>
    <description align="center" color="orange" colored="true" height="290" resized="true" width="265" x="691" y="37">We just need a few parameters to perform scoring.</description>
    </process>
    </operator>
    </process>

    This process was way more complex than what I described. I am pretty sure it can be improved an awful lot, but at least you will have something to work with.

     

    All the best,

     

    Rodrigo.

  • pusercpuserc Member Posts: 6 Contributor I

    Thank you so much Mr rfuentealba  for your tremendous help.

     

    Best Regards.

Sign In or Register to comment.