Tomorrow and the day after tomorrow..

walden21walden21 Member Posts: 2 Contributor I
edited November 2018 in Help
Hi there!

I did validation & test job for stock price forcast as below.
Could you tell me is there anything wrong in my understanding?

(1)Data : I have 780 X 5 data(as like below)
-------------------------------------------------------------------------------
Date ND_C DJ_C KSP_O PL
2006-03-30 0.13 -0.58 0.09 2.80
2006-03-31 -0.04 -0.37 1.85 1.55
2006-04-03 -0.13 0.32 1.04 1.05
2006-04-04 0.37 0.53 0.67 1.05
...
2009-06-01 0.06 3.02 2.57 -3.35
--------------------------------------------------------------------------------
(2)Validation : I trained my PolynomialRegressin model by SlidingWindowValidation.
    and wrote this model.
Here's XML for validation
<operator name="Root" class="Process" expanded="yes">
   <operator name="ExcelExampleSource" class="ExcelExampleSource">
       <parameter key="excel_file" value="C:\NDDJ_3cls.xls"/>
       <parameter key="sheet_number" value="2"/>
       <parameter key="first_row_as_names" value="true"/>
       <parameter key="create_label" value="true"/>
       <parameter key="label_column" value="5"/>
       <parameter key="create_id" value="true"/>
   </operator>
   <operator name="ExampleVisualizer" class="ExampleVisualizer">
   </operator>
   <operator name="SlidingWindowValidation" class="SlidingWindowValidation" expanded="yes">
       <parameter key="training_window_width" value="75"/>
       <parameter key="training_window_step_size" value="1"/>
       <parameter key="test_window_width" value="1"/>
       <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
           <operator name="PolynomialRegression" class="PolynomialRegression">
           </operator>
           <operator name="ModelWriter" class="ModelWriter">
               <parameter key="model_file" value="C:\DJ_NN_SW.mod"/>
           </operator>
       </operator>
       <operator name="OperatorChain" class="OperatorChain" expanded="yes">
           <operator name="ModelApplier" class="ModelApplier">
               <list key="application_parameters">
               </list>
           </operator>
           <operator name="Performance" class="Performance">
           </operator>
       </operator>
   </operator>
</operator>
(3)Test : I loaded that model and apply to "SAME" data set that was used in Condition2.
*Here's XML for test
<operator name="Root" class="Process" expanded="yes">
   <operator name="ExcelExampleSource" class="ExcelExampleSource">
       <parameter key="excel_file" value="C:\NDDJ_3cls.xls"/>
       <parameter key="sheet_number" value="3"/>
       <parameter key="first_row_as_names" value="true"/>
       <parameter key="label_column" value="4"/>
       <parameter key="create_id" value="true"/>
   </operator>
   <operator name="ModelLoader" class="ModelLoader">
       <parameter key="model_file" value="C:\DJ_NN_SW.mod"/>
   </operator>
   <operator name="ModelApplier" class="ModelApplier">
       <list key="application_parameters">
       </list>
   </operator>
</operator>
(4)SlidingWidow parameter
1.Training_Window_Width : 75
2.Training_Window_Step_size : 1
3.Test_window_width : 1
4.Horizon : 1

Did I used "the day after tomorrow's data" to predict "tomorrow' price" in test process even after 75th data?








Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there Walden21,
    Did I used "the day after tomorrow's data" to predict "tomorrow' price" in test process even after 75th data?
    No, but you did use data that comes after the test date to train the model, except on the last test example. The model that gets used in your phase 3 was trained on the last but one set of 75 examples, so if you had 1000 examples it would have been trained on examples 924-999, and validated on example 1000. If you give the examples IDs, and put in breaks before the learn and model applier operators, you can see the point.

    Hope that doesn't make things more confusing...

  • fischerfischer Member Posts: 439 Maven
    I am not sure what you are actually trying to achieve, but it looks like your processes are not doing what you expect them to do.

    First, the ModelWriter will be executed for each iteration of the SlidingWindowValidation and since you are using a constant file name, your model will be overwritten again and again. Your second process will read only the result of the last iteration. To avoid this behaviour, you can use %{a} in the filename to append the iteration number to the filename. In that case, you will end up with several models, so you have to modify your second process.

    Apart from that, you are not training on time series because your data contains one entry for each point in time. To transform this series into windows, you can, e.g., use the MultivariateSeries2WindowExamples.

    Best,
    Simon
Sign In or Register to comment.