RapidMiner

Learner III maurits_freriks
Learner III

Need help for building a process based on historical data

Hi,

 

I'm kind of new here and would like to learn something from rapidminer.

I do have built my own process in paint and I would like to copy this into rapid miner with some operators. Below you could find my process:

sketch model.png

Let me explain the basic idea:

The dataset contains a one-hour time interval of days, so a starttime and a endtime. As third column we do have the flow of gasoil. Now the goal is to find the gasoil of tomorrow based on the days before today (in my case it is 4 days, but it might be 7 or 10 or just 3, depends of the weights). With this as input I have to built something like a model which give my a output with an flow of tomorrow. This could ofcourse be tested on all my historical data. So for example The flow of 1 okt + 2 okt + 3 okt + 3 oktober --> 5 oktober.

 

In my opinion you will get a general weight of each D-1, D-2,D-3 ("today minus 1 day" , "today minus 2 days" etc.) in general. And you will put this into a model, could be linear or NN-network and there will be an output.

 

Is this realistic to built in Rapid Miner, please give me advice because I'm new and I really don't know how to start with rapid miner. Ofcourse you could sent me a private message for insights in my data.

 

Last question about my dataset: As you can see my dataset contains of 3 columns. The rows are the hours of the day. Do I have to preprocess my excel file such that I could work with days like my example or does RM do have something like an operater where you could split this automatically. Below I've draw an image. preprocessing.png

So left my input and right the different "blocks" that I would like to make such that you have a certain time window. It might also be something like 6 or 8 or 12 hours blocks, depending on the outcomes. Does RM have an operator to split this and also could combine this in my final process which you can see in my first image.

 

Please let me know if you do have a solution for this case. Excuse me for my english!

 

With kind regards,

 

Maurits Freriks

3 REPLIES
Highlighted
RM Certified Expert
RM Certified Expert

Re: Need help for building a process based on historical data

You are describing the function of the windowing operator.  It splits the series into 1-day, 2-day, 3-day, etc.

 

Here's a nice article explaining how the operator functions.  Once you have your dataset windowed you can apply the NN algorithms as you desire. 

 

http://www.simafore.com/blog/bid/106430/Using-RapidMiner-for-time-series-forecasting-in-cost-modelin...

 

I'd also recommend using the Sliding Window Validation operator because this will give your model more accurate results (it tests your model on past data and tests on future data). 

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com
Learner III maurits_freriks
Learner III

Re: Need help for building a process based on historical data

Thanks for the link! This was really helpfull.

In the article they talk about: "As usual, the second window of the nesting is used for "Apply Model" and "Performance (Forecasting)". An initial run with a Neural Net gives us about 80% prediction trend accuracy." With the performance operator I though I will receive accuracy aswell, but they only give me a root_mean_square_error. How do I get the accuracy, this is probably the most important result of my model to check if the model is right.

 

To answer more complex question, could I easily share my design somewhere?

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve test data only flow oktober days train set" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Local Repository/data/test data only flow oktober days train set"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="179" y="34">
        <parameter key="window_size" value="5"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="A"/>
        <parameter key="horizon" value="2"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="313" y="34">
        <parameter key="training_window_width" value="20"/>
        <parameter key="training_window_step_size" value="5"/>
        <parameter key="test_window_width" value="20"/>
        <parameter key="horizon" value="2"/>
        <process expanded="true">
          <operator activated="true" class="neural_net" compatibility="7.6.001" expanded="true" height="82" name="Neural Net" width="90" x="112" y="34">
            <list key="hidden_layers"/>
          </operator>
          <connect from_port="training" to_op="Neural Net" to_port="training set"/>
          <connect from_op="Neural Net" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve test data only flow oktober days test set" width="90" x="45" y="187">
        <parameter key="repository_entry" value="//Local Repository/data/test data only flow oktober days test set"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (2)" width="90" x="179" y="187">
        <parameter key="window_size" value="5"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="A"/>
        <parameter key="horizon" value="2"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="313" y="187">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve test data only flow oktober days train set" from_port="output" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="training" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <connect from_op="Retrieve test data only flow oktober days test set" from_port="output" to_op="Windowing (2)" to_port="example set input"/>
      <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 3"/>
      <connect from_op="Apply Model (2)" from_port="model" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>
RM Staff
RM Staff

Re: Need help for building a process based on historical data

Hi,

 

Thanks for sharing the XML process!

 

From previous posts in this chain, I understand that it is a forecasting/regression type of modeling. Since you are trying to build a regression model (i.e forecast continuous attribute) the criteria to criteria model performance would be Root mean square error, absolute error, rather than Accuracy measures which are applicable to classification models.

 

Also, could you share the sample data sets here; so that I can run the process on my end and see if it is possible to generate accuracy matrix?

 

Here is an article explaining in depth on model evaluation criteria. Hope this helps, let me know any further questions here.

https://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-error-metrics/

 

Cheers,

Pavithra Rao
Twitter Feed