RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
What model should I use ( training, validation or testing )
The data : My data set is 50 attributes and 3400 rows ( 90% for training, 10% for unseen testing) with the very last row reserved as the live prediction example.
The training : I use the 90% training data in 10 fold x-validation to find the best training algorithm and attribute mix for my data. Confirming the best setup selection by applying the model created on the 10% of unseen data.
My question is - Once I am happy with the above results, what model do I use ( or create ) for the live prediction of the last row? :
1) Do I use the best model created via 90% data 10 fold x-validation
2) Do I create a model with the 90% training data ( without x fold ) using the best settings found from the x-validation training.
3) Do I create a model on 100% data ( 90% training and 10% unseen ) with the best settings found from training.
Thank you in advance for your time.