02-16-2017 04:21 PM - edited 02-16-2017 04:27 PM
I would like to create and use a predictive model of buy or sell signals for a stock in order to optimize the gain.
For the inputs I created the following table:
Day is the ID
Variation is the daily price variation. I set its role as Attribute.
Indicators are attributes.
Result is equals to 1 if the previous day price variation was positive else 0.
I have created a process like this:
Where I used:
In the Cross Validation I put:
Performance result is not great. Further in another process machine learning memorize the result for a input set.
Could you help me to improve the process?
Thanks in advance
02-17-2017 02:34 AM
have you tried another learner? E.g. a Random Forest (w/o any (pre)pruning?). I would give this one a try first.
02-25-2017 06:11 AM - edited 02-25-2017 06:13 AM
I'm worried that model memorize the training set and it will not be adapt to new examples...
How can avoid it? With a X Validation component?
At the moment I have a lot of indicators, do you advise me to reduce the number of them?
02-26-2017 05:19 AM
kind of yes.
Your memorizing of the training set is called overtraining. Most algorithms have options to fight against this. It is a trade of between putting more information in and putting to much information of the training data in (it runs into Bias-Variance Tradeoff). The options for RFs and Decision Trees controlling this are the (pre)pruning options, especially minimal gain. I've posted some more details about the options here: http://community.rapidminer.com/t5/RapidMiner-Stud
X-Validation is not preventing you from overtraining, but it is showing you how good you are on unknown data sets. So if you overtrain, you will get worse results. So it is in fact helping to fight against overtraining.
Feature selection is an important topic and there is no one-size fits them all thing. I've explored some options in the feature weight article: http://community.rapidminer.com/t5/RapidMiner-Stud