How can I use deep learning with windowing operator when the horizon is larger than one?

hsanchezhsanchez Member Posts: 10 Contributor II
edited June 2020 in Help
Hello Guys,
I am trying to use the process example "s&p 500 regression using windowing and convolution" and it works well to predict the price for the next day when in windowing operator (horizon=1); however if horizon is larger than 1 (a few days ahead forecast) the deep learning operator fails. 
Question: Do you have an example where I can use deep learning, windowing and horizon > 1? I will be happy if the example "s&p 500 regression using windowing and convolution" could be modified to consider horizon > 1. I am aiming to forecast price for the next few mins ahead therefore I need horizon > 1.

I have also tried to the same deep learning operator used in the example mentioned above but this time using multi horizon forecast and the same problem occurs. Deep learning cant handle a situation when horizon > 1. I am not expert in deep learning operator but I think the apparent limitation is associated with multi label handling? 
Other operators such as gradient boosted tree works well with horizon > 1 

Below I have attached the process. 

<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
  <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve s&amp;p-500-data" width="90" x="45" y="238">
    <parameter key="repository_entry" value="//Keras Samples/sp_500_regression/s&amp;p-500-data"/>
  </operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
  <operator activated="true" class="subprocess" compatibility="9.6.000" expanded="true" height="103" name="Subprocess" origin="GENERATED_SAMPLE" width="90" x="179" y="238">
    <process expanded="true">
      <operator activated="true" class="select_attributes" compatibility="9.6.000" expanded="true" height="82" name="Select Attributes" origin="GENERATED_SAMPLE" width="90" x="45" y="136">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Close"/>
        <parameter key="attributes" value="Date|Open|Close"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <description align="center" color="transparent" colored="false" width="126">Reducing the data to the attribute we want to predict: 'Close' - Which is the closing price of respective stocks.</description>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.6.000" expanded="true" height="103" name="Normalize" origin="GENERATED_SAMPLE" width="90" x="179" y="136">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Close"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="method" value="Z-transformation"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="1.0"/>
        <parameter key="allow_negative_values" value="false"/>
        <description align="center" color="transparent" colored="false" width="126">Often normalizing data helps a neural network to perform better.</description>
      </operator>
      <operator activated="true" class="time_series:windowing" compatibility="9.6.000" expanded="true" height="82" name="Windowing (2)" origin="GENERATED_SAMPLE" width="90" x="313" y="136">
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="has_indices" value="false"/>
        <parameter key="indices_attribute" value=""/>
        <parameter key="window_size" value="30"/>
        <parameter key="no_overlapping_windows" value="false"/>
        <parameter key="step_size" value="1"/>
        <parameter key="create_horizon_(labels)" value="true"/>
        <parameter key="horizon_attribute" value="Close"/>
        <parameter key="horizon_size" value="1"/>
        <parameter key="horizon_offset" value="0"/>
        <description align="center" color="transparent" colored="false" width="126">Using windowing to convert the data into a form, that displays one entry as an attribute with preceeding 30&lt;br/&gt; entries as additional attributes.</description>
      </operator>
      <operator activated="true" class="split_data" compatibility="9.6.000" expanded="true" height="103" name="Split Data" origin="GENERATED_SAMPLE" width="90" x="447" y="136">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.9"/>
          <parameter key="ratio" value="0.1"/>
        </enumeration>
        <parameter key="sampling_type" value="linear sampling"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <description align="center" color="transparent" colored="false" width="126">Split data into training and test.</description>
      </operator>
      <connect from_port="in 1" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Windowing (2)" to_port="example set"/>
      <connect from_op="Windowing (2)" from_port="windowed example set" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_port="out 1"/>
      <connect from_op="Split Data" from_port="partition 2" to_port="out 2"/>
      <portSpacing port="source_in 1" spacing="0"/>
      <portSpacing port="source_in 2" spacing="0"/>
      <portSpacing port="sink_out 1" spacing="0"/>
      <portSpacing port="sink_out 2" spacing="0"/>
      <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Data Preparation: Normalization, Windowing, Label Setting</description>
  </operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
  <operator activated="true" class="deeplearning:dl4j_sequential_neural_network" compatibility="0.9.004" expanded="true" height="145" name="Deep Learning" origin="GENERATED_SAMPLE" width="90" x="313" y="187">
    <parameter key="loss_function" value="Mean Squared Error (Linear Regression)"/>
    <parameter key="epochs" value="50"/>
    <parameter key="use_early_stopping" value="false"/>
    <parameter key="condition_strategy" value="score improvement"/>
    <parameter key="patience" value="5"/>
    <parameter key="minimal_score_improvement" value="0.0"/>
    <parameter key="best_epoch_score" value="0.01"/>
    <parameter key="max_iteration_score" value="3.0"/>
    <parameter key="max_iteration_time" value="10"/>
    <parameter key="use_miniBatch" value="false"/>
    <parameter key="batch_size" value="32"/>
    <parameter key="updater" value="RMSProp"/>
    <parameter key="learning_rate" value="0.099"/>
    <parameter key="momentum" value="0.9"/>
    <parameter key="rho" value="0.95"/>
    <parameter key="epsilon" value="1.0E-6"/>
    <parameter key="beta1" value="0.9"/>
    <parameter key="beta2" value="0.999"/>
    <parameter key="RMSdecay" value="0.95"/>
    <parameter key="weight_initialization" value="Xavier Uniform"/>
    <parameter key="bias_initialization" value="0.0"/>
    <parameter key="use_regularization" value="false"/>
    <parameter key="l1_strength" value="0.1"/>
    <parameter key="l2_strength" value="0.1"/>
    <parameter key="optimization_method" value="Conjugate Gradient Line Search"/>
    <parameter key="cudnn_algo_mode" value="Prefer fastest"/>
    <parameter key="backpropagation" value="Standard"/>
    <parameter key="backpropagation_length" value="50"/>
    <parameter key="infer_input_shape" value="true"/>
    <parameter key="network_type" value="Simple Neural Network"/>
    <parameter key="log_each_epoch" value="true"/>
    <parameter key="epochs_per_log" value="10"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <process expanded="true">
      <operator activated="true" class="deeplearning:dl4j_convolutional_layer" compatibility="0.9.004" expanded="true" height="68" name="Add Convolutional Layer" origin="GENERATED_SAMPLE" width="90" x="112" y="136">
        <parameter key="number_of_activation_maps" value="64"/>
        <parameter key="kernel_size" value="2.2"/>
        <parameter key="stride_size" value="1.1"/>
        <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/>
        <parameter key="use_dropout" value="false"/>
        <parameter key="dropout_rate" value="0.25"/>
        <parameter key="overwrite_networks_weight_initialization" value="false"/>
        <parameter key="weight_initialization" value="Normal"/>
        <parameter key="overwrite_networks_bias_initialization" value="false"/>
        <parameter key="bias_initialization" value="0.0"/>
      </operator>
      <operator activated="true" class="deeplearning:dl4j_pooling_layer" compatibility="0.9.004" expanded="true" height="68" name="Add Pooling Layer" origin="GENERATED_SAMPLE" width="90" x="313" y="136">
        <parameter key="Pooling Method" value="max"/>
        <parameter key="PNorm Value" value="1.0"/>
        <parameter key="Kernel Size" value="2.2"/>
        <parameter key="Stride Size" value="1.1"/>
      </operator>
      <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.004" expanded="true" height="68" name="Add Fully-Connected Layer" origin="GENERATED_SAMPLE" width="90" x="514" y="136">
        <parameter key="number_of_neurons" value="100"/>
        <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/>
        <parameter key="use_dropout" value="false"/>
        <parameter key="dropout_rate" value="0.25"/>
        <parameter key="overwrite_networks_weight_initialization" value="false"/>
        <parameter key="weight_initialization" value="Normal"/>
        <parameter key="overwrite_networks_bias_initialization" value="false"/>
        <parameter key="bias_initialization" value="0.0"/>
        <description align="center" color="transparent" colored="false" width="126">Often architectures using convolutional layers end with a fully-connected layer before the last layer.</description>
      </operator>
      <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.004" expanded="true" height="68" name="Add Fully-Connected Layer (2)" origin="GENERATED_SAMPLE" width="90" x="648" y="136">
        <parameter key="number_of_neurons" value="1"/>
        <parameter key="activation_function" value="None (identity)"/>
        <parameter key="use_dropout" value="false"/>
        <parameter key="dropout_rate" value="0.25"/>
        <parameter key="overwrite_networks_weight_initialization" value="false"/>
        <parameter key="weight_initialization" value="Normal"/>
        <parameter key="overwrite_networks_bias_initialization" value="false"/>
        <parameter key="bias_initialization" value="0.0"/>
        <description align="center" color="transparent" colored="false" width="126">Since regression is performed on neuron and the 'None (identity)' activation function has to be used.</description>
      </operator>
      <connect from_port="layerArchitecture" to_op="Add Convolutional Layer" to_port="layerArchitecture"/>
      <connect from_op="Add Convolutional Layer" from_port="layerArchitecture" to_op="Add Pooling Layer" to_port="layerArchitecture"/>
      <connect from_op="Add Pooling Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer" to_port="layerArchitecture"/>
      <connect from_op="Add Fully-Connected Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer (2)" to_port="layerArchitecture"/>
      <connect from_op="Add Fully-Connected Layer (2)" from_port="layerArchitecture" to_port="layerArchitecture"/>
      <portSpacing port="source_layerArchitecture" spacing="0"/>
      <portSpacing port="sink_layerArchitecture" spacing="0"/>
      <description align="center" color="gray" colored="true" height="63" resized="false" width="712" x="75" y="448">This network architecture uses convolutional and pooling layers in combination with standard fully-connected layers.</description>
      <description align="center" color="yellow" colored="false" height="407" resized="false" width="167" x="75" y="32">A convolutional layer uses a sliding window to only take a subset of provided information into account.&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;This is done mutiple times (= activation map count), while automatically changing the so called kernel that is used as a mask for windowing.&lt;br/&gt;&lt;br/&gt;This method has the advantage of being able to focus on local patterns.</description>
      <description align="center" color="yellow" colored="false" height="313" resized="false" width="183" x="269" y="34">A pooling layer eases the training process by reducing the information.&lt;br&gt;&lt;br&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;br&gt;&lt;br&gt;Here only the maximum value of each 2x2 kernel window (created in the previous Convolutional Layer) is kept.</description>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Open the Deep Learning operator by double-clicking on it, to discovere the layer setup.</description>
  </operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
  <operator activated="true" class="apply_model" compatibility="9.6.000" expanded="true" height="82" name="Apply Model" origin="GENERATED_SAMPLE" width="90" x="447" y="238">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
  </operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
  <operator activated="true" class="performance_regression" compatibility="9.6.000" expanded="true" height="82" name="Performance" origin="GENERATED_SAMPLE" width="90" x="581" y="238">
    <parameter key="main_criterion" value="first"/>
    <parameter key="root_mean_squared_error" value="false"/>
    <parameter key="absolute_error" value="false"/>
    <parameter key="relative_error" value="true"/>
    <parameter key="relative_error_lenient" value="false"/>
    <parameter key="relative_error_strict" value="false"/>
    <parameter key="normalized_absolute_error" value="false"/>
    <parameter key="root_relative_squared_error" value="false"/>
    <parameter key="squared_error" value="false"/>
    <parameter key="correlation" value="false"/>
    <parameter key="squared_correlation" value="false"/>
    <parameter key="prediction_average" value="false"/>
    <parameter key="spearman_rho" value="false"/>
    <parameter key="kendall_tau" value="false"/>
    <parameter key="skip_undefined_labels" value="true"/>
    <parameter key="use_example_weights" value="true"/>
  </operator>
</process>

Best Answer

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    If forecasting the S&P Index was easy then there would be a lot of rich data scientists :-)
    You might want to look at the new Forecasting extension, which has some automated operators for both univariate and multivariate forecasting.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @hsanchez

    I agree with Brian and I can not prevent me to quote Pierre DAC : 

    "...Forecasting is difficult, especially when it comes to the future..."  ;)

    Regards,

    Lionel
  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Another possibility is to use a Python extension, prepare your data in RM, do all your Deep Learning magic in Python with Tensorflow (for example) and then return the multiple horizon output as a vector back to RM. 
  • hsanchezhsanchez Member Posts: 10 Contributor II
    Hello Guys, Thank you very much for taking your time to look at my post and answering my question with good humor @Telcontar120 and some comments from @lionelderkrikor and thanks to @jacobcybulski for his answer. I would say that @jacobcybulski answered my question. Yes, sure Stock Market prediction is a difficult thing but that does not prevent us to try it and challenging ourselves with something very close to "can we forecast emotions of a group of people?". Each stock, groups people by a common interest and behavior. I may say each stock represents lets say an average emotion corresponding to the people forming  that stock. Yes, sure it is chaotic/random but should be a way to get it right "some times". I am not market specialist, I am just curious and I never read a book about stock market. Having the curiosity as driver and rapidminer as tool off hope  I decided to do something that yield lets say by luck :wink: 400 dollars. Yes, that will not make you rich @Telcontar120 :smiley: but I was able to buy tons of satisfaction.  By the way, I used GBT and I would say that algorithm is a rock!!! it is wonderful. 
    Pull the intraday data using Python->Apply STL to remove some noise->(I am using multi-variable), normalize series->weight by PCA to focus on those variables that real matter-> windowing (to train, validate, and use my last row to apply my model) and another parallel windowing to enrich the data (feature generation) with parameters such as min, max,  std deviation, etc. Split the data in three parts: train, evaluate the model, and use the last row as unseen, multi-horizon forecasting using GBT (GBT wow! when you tune it), multi-horizon performance. Apply model to the unseen data, multi-horizon performance, Tune ARIMA and apply it to the sequence to compare its performance/forecasting  with that designed using GBT. 
    Then you  can get some satisfactions when "some times" you get it right. 
    What I learn? 
    1. Rapidminer did a great job with this time series extension.
    2. Each stock has its own emotion and behavior.
    3. There is not such thing of "free lunch" , you shall develop one model per stock. Each stock has its own personality and emotions. 
    4. GBT well tuned can surprise you.
    I want to remark. I am not an expert in time series either stock market. I am just curious and the process described above may be subject to missing steps. 
    Thank you guys for your time

Sign In or Register to comment.