How to build a predictive model to optimize gain

fpaganelfpaganel Member Posts: 4 Contributor I
edited November 2018 in Help

Hi All,

I would like to create and use a predictive model of buy or sell signals for a stock in order to optimize the gain.

 

For the inputs I created the following table:
Cattura.PNGWhere:

 

Day is the ID

Variation is the daily price variation. I set its role as Attribute.

Indicators are attributes.

Result is equals to 1 if the previous day price variation was positive else 0.

 

I have created a process like this:
Cattura.PNG

 Where I used:

  • Retrieve to load inputs
  • Normalize for normalization
  • Filter Examples to split input between training inputs and inputs for prediction
  • Generate Attributes to derive another attribute with a formula
  • Set Role to set ID and Label roles
  • Cross Validation to train the model
  • Multiply to duplicate the model for the Apply Model component and Store it in a file

In the Cross Validation I put:

Cattura.PNG

 

 

 

 

  • Deep Learning to learning and create the model
  • Performance (Binomial classification) to evaluate the model. I set like main creation the "True positive".

Performance result is not great. Further in another process machine learning memorize the result for a input set.

Could you help me to improve the process? 

 

Thanks in advance

Francesco

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    have you tried another learner? E.g. a Random Forest (w/o any (pre)pruning?). I would give this one a try first.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • fpaganelfpaganel Member Posts: 4 Contributor I

    I'm worried that model memorize the training set and it will not be adapt to new examples...

    How can avoid it? With a X Validation component?

     

    At the moment I have a lot of indicators, do you advise me to reduce the number of them?

     

    Thanks

    Francesco

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hey Francesco,

     

    kind of yes.

     

    Overtraining

    Your memorizing of the training set is called overtraining. Most algorithms have options to fight against this. It is a trade of between putting more information in and putting to much information of the training data in (it runs into Bias-Variance Tradeoff). The options for RFs and Decision Trees controlling this are the (pre)pruning options, especially minimal gain. I've posted some more details about the options here: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/My-Decision-Tree-Shows-only-one-Node/ta-p/33259 . The nice thing about Random Forests are, that they are pretty robust against overtraining by themselves. That was one reason why I recommended them.

    X-Validation is not preventing you from overtraining, but it is showing you how good you are on unknown data sets. So if you overtrain, you will get worse results. So it is in fact helping to fight against overtraining.

     

    Feature Selection

    Feature selection is an important topic and there is no one-size fits them all thing. I've explored some options in the feature weight article: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Feature-Weighting-Tutorial/ta-p/35281 . The nice thing about a Random Forest is, that it is doing an internal feature selection at each node. It is also a bit more robust than other algorithm to bad features. The option to play with would be is subset_ratio if you do not use guess subset ratio. I would give the plain rf a try, change this setting a bit and then move forward to more feature selection (and generation!).

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • fpaganelfpaganel Member Posts: 4 Contributor I

    I come back to this project.
    I left it because I didn't reach good results.
    Now, I would like change the approach.
    I would like create a model where the machine learn from the training set (built by different stock data) and at the end of the day I would apply it to many stocks and it have return a confident rate for the forecast of each stock.

    How do you suggest to proceed?

     

    Thanks in advance

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @fpaganel

     

    This seems to be quite old thread, but you last question calls for @Thomas_Ott definitely

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @kypexin you rang? :) 

     

    @fpaganel so if you want to use time series in RapidMiner you should install the Value Series extension. There's a Sliding Window Validation operator that does backtesting. From there you can measure forecast performance or RSME, etc. 

     

    I go over quite a bit of time series process building (with explainations) in my latest Live Stream here: https://youtu.be/WdYpWAFxzR8

     

     

Sign In or Register to comment.