RapidMiner

Expert opinion requested on Times Series based Prediction

SOLVED
Contributor II luc_bartkowski
Contributor II

Expert opinion requested on Times Series based Prediction

So I'm studying machine learning using RapidMiner and I'm now focusing on Time Series Prediction.

 

My son earns some pocket money by trading stocks, forex and futures. He does that with technical analyses of prices.

He looks for an asset that shows a clear trend in conformance of Selecting Forecasting Methods in Data Science.

Then my son zooms in on the M-curves of the latest period. Using support and trendlines he "predicts" the future price of the asset.

My thought was to give him a Machine Learning perspective on his analyses.

 

So I looked at Oil Futures and build a process model on it, based on the daily "Last" values. The model looks like this:

oilpredmod.jpg

In the upper left I have implemented 3 RapidMiner Macros:

  1. %{AnalysesDateFrom}: From where to pick up the "wave to surf" trend like my son is doing.
  2. %{PredictionDateFrom}: This is my "hold off" parameter. I train the model to this date. I let the model predict from this date.
  3. %{PredictionHorizon}: It sets the Horizon parameters in the Windowing operator, in the Sliding Window Validation operator and in the Forecasting Performance operator implemented in the subprocess of the Sliding Window Validation operator so all operators work with the same Horizon.

When I run the model with %{AnalysesDateFrom} = "Feb 10, 2016", %{PredictionHorizon}=10 and %{PredictionDateFrom}="Aug 28, 2017" (last month) the model returns a prediction_trend_accuracy: 0.625 +/- 0.099 (mikro: 0.625). For what this accuracy figure is worth, I know that value prediction is "slippery ice", I'm therefore more interested in trends.

 

My question is related to the next graph in which I have plotted the prediction together with the real "Last" values.

oilpredgraph.jpeg

This plot clearly shows that the trend of the prediction is in conformance of the trend of the real "Last" values.

What I don't understand is that the prediction and the real "Last" values are "in phase" which each other. I would expect a phase shift between both lines, a phase shift equivalent to the Prediction Horizon. That phase shift is not visible. What am I doing wrong here?

 

The only explanation I can think of for the absence of a phase shift is that the value of an asset in a moment in time is the best indication of the future value of this asset. In other words: the current value of an asset incorporates already future values of this asset. That would explain that the lines of real values and the prediction values are in sync with each other. But I am not sure so I would like to receive an expert opinion on this.

5 REPLIES
RM Staff
RM Staff

Re: Expert opinion requested on Times Series based Prediction

Hi Luc,

 

i think the answer is  simple. Your prediction(label) is the oil price tomorrow (or in x days). While your OilLast-0 is the OilPrice today (-0 indicates 0 days lookback). 

 

You most likely want to also generate a Label in the lower windowing and compare this to the prediction.

 

Cheers,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Contributor II luc_bartkowski
Contributor II

Re: Expert opinion requested on Times Series based Prediction

Thank you @mschmitz for your fast reply,

 

"Your prediction(label) is the oil price tomorrow (or in x days)".

"While your OilLast-0 is the OilPrice today (-0 indicates 0 days lookback)."

 

I understand both. But I don't see it in the graph and the exampleset:

oilpredgraph.jpegoilpredvalues.jpg

 

I checked also the examplesets of the upper and lower Windowing operators using a "breakpoint after".

My source data is stored in MySQL. I compared both to make sure that my process is working as expected.

The value of the Label on August 25 is based upon the "Last" value of August 11 in the source data.

August 11 is 10 days before August 25 so that is correct.

 

The values of the "-0" attributes of August 25 are equivalent to the attributes of the source data on August 25.

That is also correct.

windowing.jpeg

 

The results of the lower Windowing operator are also correct.

The values of all "-0" attributes on September 28 are equivalent to the source data on September 28.

windowing2.jpeg

 

So I don't understand the graph. It looks like the prediction is following the real values of "Last" instead of the other way around.

 

 

This is my process model:

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="68" name="Training Period From Date" width="90" x="45" y="34">
        <parameter key="macro" value="AnalysesDateFrom"/>
        <parameter key="value" value="2016/02/11"/>
      </operator>
      <operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="68" name="Prediction From Date" width="90" x="179" y="34">
        <parameter key="macro" value="PredictionDateFrom"/>
        <parameter key="value" value="2017/08/28"/>
      </operator>
      <operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="68" name="Prediction Horizon" width="90" x="313" y="34">
        <parameter key="macro" value="PredictionHorizon"/>
        <parameter key="value" value="10"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Get/Join Data" width="90" x="447" y="34">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Oil Futures" width="90" x="313" y="34">
            <process expanded="true">
              <operator activated="false" class="jdbc_connectors:read_database" compatibility="7.6.001" expanded="true" height="68" name="Read Database (2)" width="90" x="45" y="34">
                <parameter key="define_connection" value="predefined"/>
                <parameter key="connection" value="MySQL"/>
                <parameter key="database_system" value="MySQL"/>
                <parameter key="define_query" value="query"/>
                <parameter key="query" value="SELECT *&#10;FROM `oil`&#10;ORDER BY Date desc&#10;limit 9999"/>
                <parameter key="use_default_schema" value="true"/>
                <parameter key="prepare_statement" value="false"/>
                <enumeration key="parameters"/>
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
              </operator>
              <operator activated="false" class="store" compatibility="7.6.001" expanded="true" height="68" name="Store (11)" width="90" x="179" y="34">
                <parameter key="repository_entry" value="//Cloud Repository/Samples/data/oilfuturesvw"/>
              </operator>
              <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve (2)" width="90" x="45" y="136">
                <parameter key="repository_entry" value="../data/oilfuturesvw"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="514" y="34">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value="Volume|Settle|Previous Day Open Interest|Open|Low|Last|High|Date"/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
              </operator>
              <operator activated="true" class="nominal_to_date" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Date (8)" width="90" x="648" y="34">
                <parameter key="attribute_name" value="Date"/>
                <parameter key="date_type" value="date"/>
                <parameter key="date_format" value="yyyy-MM-dd"/>
                <parameter key="time_zone" value="SYSTEM"/>
                <parameter key="locale" value="English (United States)"/>
                <parameter key="keep_old_attribute" value="false"/>
              </operator>
              <operator activated="true" class="rename" compatibility="7.6.001" expanded="true" height="82" name="Rename (8)" width="90" x="782" y="34">
                <parameter key="old_name" value="Date"/>
                <parameter key="new_name" value="oilDate"/>
                <list key="rename_additional_attributes">
                  <parameter key="High" value="oilHigh"/>
                  <parameter key="Low" value="oilLow"/>
                  <parameter key="Open" value="oilOpen"/>
                  <parameter key="Previous Day Open Interest" value="oilPrevDayOpenInt"/>
                  <parameter key="Settle" value="oilSettle"/>
                  <parameter key="Volume" value="oilVolume"/>
                  <parameter key="Last" value="oilLast"/>
                </list>
              </operator>
              <connect from_op="Retrieve (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
              <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Date (8)" to_port="example set input"/>
              <connect from_op="Nominal to Date (8)" from_port="example set output" to_op="Rename (8)" to_port="example set input"/>
              <connect from_op="Rename (8)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Oil Futures" from_port="out 1" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="store" compatibility="7.6.001" expanded="true" height="68" name="Store" width="90" x="581" y="34">
        <parameter key="repository_entry" value="../data/oilData"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply" width="90" x="45" y="289"/>
      <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="112" y="187">
        <parameter key="attribute_name" value="oilLast"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="oilDate" value="id"/>
          <parameter key="oilLast" value="regular"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="187">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="oilLast|oilHigh|oilLow|oilOpen|oilSettle|oilPrevDayOpenInt|oilVolume"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Start of Trend" width="90" x="380" y="187">
        <parameter key="parameter_expression" value="date_after(oilDate, date_parse_custom(%{AnalysesDateFrom}, &quot;yyyy/MM/dd&quot;))"/>
        <parameter key="condition_class" value="expression"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Hold off / Training" width="90" x="514" y="187">
        <parameter key="parameter_expression" value="date_before(oilDate, date_parse_custom(%{PredictionDateFrom}, &quot;yyyy/MM/dd&quot;))"/>
        <parameter key="condition_class" value="expression"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="648" y="187">
        <parameter key="series_representation" value="encode_series_by_examples"/>
        <parameter key="window_size" value="1"/>
        <parameter key="step_size" value="1"/>
        <parameter key="create_single_attributes" value="true"/>
        <parameter key="create_label" value="true"/>
        <parameter key="select_label_by_dimension" value="false"/>
        <parameter key="label_attribute" value="oilLast"/>
        <parameter key="horizon" value="%{PredictionHorizon}"/>
        <parameter key="add_incomplete_windows" value="false"/>
        <parameter key="stop_on_too_small_dataset" value="true"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="782" y="187">
        <parameter key="create_complete_model" value="false"/>
        <parameter key="training_window_width" value="100"/>
        <parameter key="training_window_step_size" value="1"/>
        <parameter key="test_window_width" value="100"/>
        <parameter key="horizon" value="%{PredictionHorizon}"/>
        <parameter key="cumulative_training" value="true"/>
        <parameter key="average_performances_only" value="true"/>
        <process expanded="true">
          <operator activated="true" class="support_vector_machine" compatibility="7.6.001" expanded="true" height="124" name="SVM" width="90" x="185" y="34">
            <parameter key="kernel_type" value="dot"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_degree" value="2.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="kernel_cache" value="200"/>
            <parameter key="C" value="0.0"/>
            <parameter key="convergence_epsilon" value="0.001"/>
            <parameter key="max_iterations" value="100000"/>
            <parameter key="scale" value="true"/>
            <parameter key="calculate_weights" value="true"/>
            <parameter key="return_optimization_performance" value="true"/>
            <parameter key="L_pos" value="1.0"/>
            <parameter key="L_neg" value="1.0"/>
            <parameter key="epsilon" value="0.0"/>
            <parameter key="epsilon_plus" value="0.0"/>
            <parameter key="epsilon_minus" value="0.0"/>
            <parameter key="balance_cost" value="false"/>
            <parameter key="quadratic_loss_pos" value="false"/>
            <parameter key="quadratic_loss_neg" value="false"/>
            <parameter key="estimate_performance" value="false"/>
          </operator>
          <connect from_port="training" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <parameter key="horizon" value="%{PredictionHorizon}"/>
            <parameter key="main_criterion" value="prediction_trend_accuracy"/>
            <parameter key="prediction_trend_accuracy" value="true"/>
            <parameter key="skip_undefined_labels" value="true"/>
            <parameter key="use_example_weights" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="442">
        <parameter key="attribute_name" value="oilLast"/>
        <parameter key="target_role" value="prediction"/>
        <list key="set_additional_roles">
          <parameter key="oilDate" value="id"/>
          <parameter key="oilLast" value="regular"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes (4)" width="90" x="313" y="442">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="oilLast|oilHigh|oilLow|oilOpen|oilSettle|oilVolume|oilPrevDayOpenInt"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Hold off/Prediction" width="90" x="447" y="442">
        <parameter key="parameter_expression" value="date_after(oilDate, date_parse_custom(%{PredictionDateFrom}, &quot;yyyy/MM/dd&quot;))"/>
        <parameter key="condition_class" value="expression"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (2)" width="90" x="581" y="442">
        <parameter key="series_representation" value="encode_series_by_examples"/>
        <parameter key="window_size" value="1"/>
        <parameter key="step_size" value="1"/>
        <parameter key="create_single_attributes" value="true"/>
        <parameter key="create_label" value="false"/>
        <parameter key="select_label_by_dimension" value="false"/>
        <parameter key="label_attribute" value="oilLast"/>
        <parameter key="horizon" value="0"/>
        <parameter key="add_incomplete_windows" value="false"/>
        <parameter key="stop_on_too_small_dataset" value="false"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="715" y="442">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <connect from_op="Get/Join Data" from_port="out 1" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Start of Trend" to_port="example set input"/>
      <connect from_op="Filter Start of Trend" from_port="example set output" to_op="Filter Hold off / Training" to_port="example set input"/>
      <connect from_op="Filter Hold off / Training" from_port="example set output" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/>
      <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Filter Hold off/Prediction" to_port="example set input"/>
      <connect from_op="Filter Hold off/Prediction" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
      <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <description align="center" color="yellow" colored="true" height="50" resized="true" width="381" x="30" y="111">Configuration</description>
    </process>
  </operator>
</process>


Thanks for your support.

 

Cheers,

Luc

 

RM Certified Expert
RM Certified Expert

Re: Expert opinion requested on Times Series based Prediction

I'll be posting my Historical Volatility process when I have a chance to write it up. In that process you take a t=0 time series and predict at t+1 value. From there you can see how it works. 

Contributor II luc_bartkowski
Contributor II

Re: Expert opinion requested on Times Series based Prediction

I think I have found the answer on my question.
But I don't know how to implement it.

 

Looking again to the problem I conclude the following:

windowingas should.jpeg

On August 11 the Label should look at the "Last" value of August 25 to learn/validate. See the blue markup.

Instead, as I indicated before, the upper Windowing operator is looking backwards, it puts the last value of August 11 as Label on August 25.

 

I tried to configure the upper Windowing operator looking forwards in stead of backwards by configuring a negative -10 or (%{PredictionHorizon})*-1) in the Horizon parameter. The Horizon parameter of the Windowing operator doesn't accept negative integers, only positive integers. So I don't know how to implement a forward looking Label instead of a backward looking Label.

 

I'm using v. 7.6001

 

Greetings,

Luc

Highlighted
Contributor II luc_bartkowski
Contributor II
Solution

Re: Expert opinion requested on Times Series based Prediction

I have found the answer on my question.

 

My source data is sorted on dates because I use a SQL script to prevent to load too much data compared to my RM license.

I use the following SQL: "SELECT * FROM oil ORDER BY Date DESC LIMIT 9999".

The example set as input for the Windowing operators are sorted decending on Date.

When I sort the example set on Date ascending then the model works as expected.

See next pictures

 

Added Sort operatorAdded Sort operator

New resulting example setNew resulting example set

Prediction is almost equivalent with oilLast-0Prediction is almost equivalent with oilLast-0

No phase shift. Of course not. Question answered.
Watch out for sorting dates.

Apperently RM is not using the value of a Date attribute during "Set Role to ID" but it establish an ID on basis of the input sort order.

 

Greetings,

Luc