Options

"Prediction accuracy problem"

c1borgc1borg Member Posts: 17 Maven
edited June 2019 in Help
Ok firstly hi to everyone this is my first post. My problem is to predict stock movements. So firstly I created a spreadsheet in Excel with daily closing prices for 3 stocks and a prediction to learn against which is OUT, LONG & SHORT
I run the prediction and get 85% accuracy
             true out true long true short
pred. out          1626   85               73
pred. long 77          660              93
pred. short 62           73             433
class recall 92.12% 80.68% 72.29%
I then run the saved model on 1 year of test data not previously used for the prediction and the result of true vs predicted value is only 53%
What have I done wrong?

I can post xml's and excel sheet if required or answer in more detail if requested
Tagged:

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    Welcome to the whacky world of RM!

    Without seeing the XML and data it is almost impossible to give a useful answer; that being said my experience is that, when it comes to financial prediction, the more realistic the setup the lower the accuracy, dammit  >:( You can check out globestreetjournal.com to see what I mean.
  • Options
    c1borgc1borg Member Posts: 17 Maven
    I was going to attach the files but cant work out how to? So here are the 2 xml's

    <?xml version="1.0" encoding="windows-1252"?>
    <process version="4.4">

      <operator name="Root" class="Process" expanded="yes">
          <parameter key="logverbosity" value="init"/>
          <parameter key="random_seed" value="2001"/>
          <parameter key="encoding" value="SYSTEM"/>
          <operator name="ExcelExampleSource" class="ExcelExampleSource">
              <parameter key="excel_file" value="C:\Files\Rapidminer system\OS Prediction Daily\GoldOSinput.xls"/>
              <parameter key="sheet_number" value="1"/>
              <parameter key="row_offset" value="0"/>
              <parameter key="column_offset" value="0"/>
              <parameter key="first_row_as_names" value="true"/>
              <parameter key="create_label" value="true"/>
              <parameter key="label_column" value="5"/>
              <parameter key="create_id" value="true"/>
              <parameter key="id_column" value="1"/>
              <parameter key="decimal_point_character" value="."/>
              <parameter key="datamanagement" value="double_array"/>
          </operator>
          <operator name="ExampleVisualizer" class="ExampleVisualizer" breakpoints="after">
          </operator>
          <operator name="XValidation" class="XValidation" expanded="yes">
              <parameter key="keep_example_set" value="false"/>
              <parameter key="create_complete_model" value="false"/>
              <parameter key="average_performances_only" value="true"/>
              <parameter key="leave_one_out" value="false"/>
              <parameter key="number_of_validations" value="10"/>
              <parameter key="sampling_type" value="stratified sampling"/>
              <parameter key="local_random_seed" value="-1"/>
              <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                  <operator name="W-IBk" class="W-IBk">
                      <parameter key="keep_example_set" value="false"/>
                      <parameter key="I" value="false"/>
                      <parameter key="F" value="false"/>
                      <parameter key="K" value="1.0"/>
                      <parameter key="E" value="false"/>
                      <parameter key="W" value="0.0"/>
                      <parameter key="X" value="false"/>
                      <parameter key="A" value="weka.core.neighboursearch.LinearNNSearch -A &quot;weka.core.EuclideanDistance -R first-last&quot;"/>
                  </operator>
                  <operator name="ModelWriter" class="ModelWriter">
                      <parameter key="model_file" value="C:\Files\Rapidminer system\OS Prediction Daily\OS Prediction Daily.mod"/>
                      <parameter key="overwrite_existing_file" value="true"/>
                      <parameter key="output_type" value="XML Zipped"/>
                  </operator>
              </operator>
              <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                  <operator name="ModelApplier" class="ModelApplier">
                      <parameter key="keep_model" value="false"/>
                      <list key="application_parameters">
                      </list>
                      <parameter key="create_view" value="false"/>
                  </operator>
                  <operator name="PerformanceEvaluator" class="PerformanceEvaluator">
                      <parameter key="keep_example_set" value="false"/>
                      <parameter key="main_criterion" value="first"/>
                      <parameter key="root_mean_squared_error" value="false"/>
                      <parameter key="absolute_error" value="true"/>
                      <parameter key="relative_error" value="true"/>
                      <parameter key="relative_error_lenient" value="false"/>
                      <parameter key="relative_error_strict" value="false"/>
                      <parameter key="normalized_absolute_error" value="false"/>
                      <parameter key="root_relative_squared_error" value="false"/>
                      <parameter key="squared_error" value="false"/>
                      <parameter key="correlation" value="true"/>
                      <parameter key="squared_correlation" value="true"/>
                      <parameter key="prediction_average" value="false"/>
                      <parameter key="prediction_trend_accuracy" value="false"/>
                      <parameter key="AUC" value="false"/>
                      <parameter key="cross-entropy" value="false"/>
                      <parameter key="margin" value="false"/>
                      <parameter key="soft_margin_loss" value="false"/>
                      <parameter key="logistic_loss" value="false"/>
                      <parameter key="accuracy" value="true"/>
                      <parameter key="classification_error" value="true"/>
                      <parameter key="kappa" value="false"/>
                      <parameter key="weighted_mean_recall" value="false"/>
                      <parameter key="weighted_mean_precision" value="false"/>
                      <parameter key="spearman_rho" value="false"/>
                      <parameter key="kendall_tau" value="false"/>
                      <parameter key="skip_undefined_labels" value="true"/>
                      <parameter key="use_example_weights" value="true"/>
                      <list key="class_weights">
                      </list>
                  </operator>
              </operator>
          </operator>
      </operator>

    </process>

    <?xml version="1.0" encoding="windows-1252"?>
    <process version="4.4">

      <operator name="Root" class="Process" expanded="yes">
          <parameter key="logverbosity" value="init"/>
          <parameter key="random_seed" value="2001"/>
          <parameter key="encoding" value="SYSTEM"/>
          <operator name="ExcelExampleSource" class="ExcelExampleSource">
              <parameter key="excel_file" value="C:\Files\Rapidminer system\OS Prediction Daily\GoldOSinput.xls"/>
              <parameter key="sheet_number" value="2"/>
              <parameter key="row_offset" value="0"/>
              <parameter key="column_offset" value="0"/>
              <parameter key="first_row_as_names" value="true"/>
              <parameter key="create_label" value="false"/>
              <parameter key="label_column" value="1"/>
              <parameter key="create_id" value="true"/>
              <parameter key="id_column" value="1"/>
              <parameter key="decimal_point_character" value="."/>
              <parameter key="datamanagement" value="double_array"/>
          </operator>
          <operator name="ModelLoader" class="ModelLoader">
              <parameter key="model_file" value="C:\Files\Rapidminer system\OS Prediction Daily\OS Prediction Daily.mod"/>
          </operator>
          <operator name="ModelApplier" class="ModelApplier">
              <parameter key="keep_model" value="false"/>
              <list key="application_parameters">
              </list>
              <parameter key="create_view" value="false"/>
          </operator>
      </operator>

    </process>

    If you can point me in the right direction so I can post attachments I will do that.

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi,

    I'll take a better look tomorrow, but one point is clear, namely that stratified sampling cannot be right for the validation, as you could end up training on the future if you think about it... try sliding window validation instead.

    Gottarush, cheers.

  • Options
    c1borgc1borg Member Posts: 17 Maven
    Ok many thanks if you need the data file let me know but I might have to email you with It cant find the attachment option?
  • Options
    haddockhaddock Member Posts: 849 Maven
    G'Day c1borg!

    No need to send in the data. The core of your problem is that the results of validating your model are so different from the results you get when you apply it to unseen data. Applying the model is fine, so you need to concentrate on the validation end. Validation splits the data into training and test sets, making the model from the former and applying it to the latter. So the key notion is to make sure that this splitting is done sensibly.

    For your problem you need to be certain that the training is done on examples that occur before the examples to be tested. If you check out http://en.wikipedia.org/wiki/Stratified_sampling you will see that stratified sampling does not do this. However, sliding a window down your examples ensures that this cannot happen, so that would be a possibility.

    Happy mining, and good luck!
  • Options
    c1borgc1borg Member Posts: 17 Maven
    Ok many thanks will take your advice.
Sign In or Register to comment.