The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
[SOLVED]Optimize selection, how to get the resulting best model?
Hello,
first of all i would like to congrat you for this magnificent piece of software you have done, It´s really productive and easy to grow all the steps one could figure out.
Now, my question is that I have an Optimeze Selection (evolutionary) operator with a Log to see the population fitness of the evolution process. My problem is that I dont know how to get the best resulting model and the only thing that I can do is to re-train another model with the resulting attribute weigths.
Is this the correct way to do it?.
Again, congratulations to Rapid-I and the developers of this soft.
first of all i would like to congrat you for this magnificent piece of software you have done, It´s really productive and easy to grow all the steps one could figure out.
Now, my question is that I have an Optimeze Selection (evolutionary) operator with a Log to see the population fitness of the evolution process. My problem is that I dont know how to get the best resulting model and the only thing that I can do is to re-train another model with the resulting attribute weigths.
Is this the correct way to do it?.
Again, congratulations to Rapid-I and the developers of this soft.
0
Answers
thanks for your kind words! We really appreciate positive comments about RapidMiner. (Of course we also appreciate negative comments but we actually like the positive ones much more ;D ) Yes, you have to train the model on the complete data set - but only on those attributes which has been selected - to get the final prediction model. There is actually a good reason for that: there actually is no "best" resulting model: I assume that you have used an inner cross validation, let's say with 10 folds. That means that there are actually 10 different models for each attribute selection. Which one is the best? The one with the best performance on the test set? Well, that would be overfitting to the test set. My answer is: there is no best model coming out from cross validation. Cross validation is for performance estimation only, not for model selection. This has to be done independently in order to not introduce a new form of test-set-overfitting.
Getting the weights and the data to the outside of the cross validation actually also allows for more nice tricks: you could now train the right model on the complete data, apply the weights to an independent test set which has not been used for the attribute selection, calculate a performance and put all these things in another, outer cross validation. By this you can measure even the overfitting effect of the attribute selection itself (which will be definitely there!).
Below you can find a process which trains the final model and applies it on another data set without the label (scoring). As you can see, it is important to select the same attributes also on the other data set which can be done with the "Select by Weights" operator. This process, however, does not show an outer cross validation... Cheers,
Ingo
I think that I understand what you say, but, in this case, what i was actually doing was splitting the data inside the Optimize Selection and evaluate it over one third of the oiriginal data to get the performance, so the process could run a little faster than a entire X-Validation. Nevertheless, I havent notice that what I was expecting to do was senseless, since, as you say, it will always be better have all the data to train the final model.
Than you,
Juan.
I found this question while trying to learn more about optimizing attribute selection. I loaded the process in and some of the operators (X-validation) looked outdated. With the amazing job of updating that Rapidminer does, I was wondering if there was a more updated version of this process that might perform better? If I was more clever or skilled with Rapidminer, I might be able to answer this myself, but as I am not, I thought maybe some of the awesome people here might be able to help me. Or maybe it operates just fine and requires no changes.