Mining for stock entry rules
Like many before me I'm a complete newbie to RapidMiner. This is my first post. Very impressive program and I'm very happy that it's available as Open Source. I'm running it on my Gentoo 64-bit machine and so far it seems to be working well.
Now, I've never used data mining before and am likely going to ask all the wrong things so please be kind. Don't worry too much. I can take a punch if I do something stupid.
OK, the initial task I've set for myself is to see if RapidMiner can extract a set of (for now) *ENTRY* rules that would help with day trading. I've prepared a data file with OHLCV data as well as a number of technical indicators that I currently use. The file is in csv format and seem to be able to successfully read it in using either the ExampleSource and CSVExampleSource operators.
Having done that I've managed to use a couple of operators as preprocessors, etc. For instance I can apply the Normalization operator and the model will play. However as soon as I add a RuleLearner I get a message in the message window that says "[Fatal] Process failed: Input example set does not have a label attribute". My first question is what, in general, do I need to do to get past this problem?
The larger question I have is in general how do I descibe the sort of criteria I would find acceptable in the rules RapidMiner mines? For instance, say I'm asking it to find a rule for going long a futures contract, I'd like a rule that did something like this:
1) Sometime in the next 30 bars there is a potential for a 2% gain. (Might be measured using high or possibly close.) Using 5 minute data that's about 2 1/2 hours which is good for a day trade.
2) Within whatever number of bars the required gain is developing there is no bar with more than a 1% drawdown or the entry would be considered a failure. (Must be measured using low.)
If I can mine out a rule like that then I'd like to understand what percentage of the time the rule works.
Assuming I can find a rule like this then I'll address exit rules later. I.e. - this is (for now) only about using indicators to start the trade.
For now I have a recent data set of 76,000 examples. I tried to load 400,000 samples but RapidMiner said I ran out of memory. (4GB - I'm surprised but I guess it's possible since it happened!) ;-)
Thanks in advance and I look forward to becoming a contributing member of the group.