RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
Automodel feedback : Debate about the models training
I wanted friendly and humbly open a debate about the training method of the models in RapidMiner's Auto Model.
In deed, from what I understood of the "data science methodology", after evaluating and selecting the "best" model, this one has to be (re)trained with the whole initial dataset before entering in production.
This principle is also applied by the Split Validation operator : The model delivered by RapidMiner is trained with the whole input dataset (independently of the split ratio).
BUT, this is not the case in Auto Model, the model(s) provided / made available by RapidMiner's Auto Model is (are) trained with only 60 % of the input dataset.
My first question is : Is it always relevant to (re)train the selected model with the whole input dataset ?
if yes and if it is feasible , it is maybe a good idea to implement this principle in Auto Model.(I think of users (no data-scientists /beginners) who do not want to ask questions and who just want a model to go into production...)
But maybe for a computation time constraint, (or another technical reason) it is not feasible to (re)train all the models with the whole initial dataset ?
In this case (not feasible), it is maybe a good idea to advise the user in Auto Model (in the documentation and/or in the overview of the results and/or in the "model" menus of the differents models) to (re)train manually, by generating the process of the selected model, before it enters in production...
To conclude, I hope I helped advance the debate and I hope to have your opinion on these topics.
Have a nice day,