12-07-2016 11:56 AM - edited 12-07-2016 11:58 AM
Today we released a new extension as part of RapidMiner labs initiative to provide easy way to compare models, track model performance over period of time as well as automatic replacement of models if newer build models get better replacements.
This can achieved by using the new "Compare Models" operator, you can download in studio from Marketplace from studio
Search for "Model Management"
or from here.
You can get started with the sample process, and notice the various parameters
I hope all of you RapidMiner's can easily discover how it works.
We will follow up with a video and how to article shortly here.
Let us know your feedback.
12-18-2016 03:36 AM
Thank you for this new operator! How can we use it in cross validation setting? I could not manage to build a process where we can test the models on unseen data and take the average performances. I tried something like the attached but it is not cross validation. Also, I noticed in your log file it is intended to record average performances and their deviations. What was your intention in doing so? Thanks in advance.
12-18-2016 06:49 AM - edited 12-18-2016 07:01 AM
Thank you for your feedback.
The model management operator is designed to test models that are build on same dataset and by using a common test dataset to give you performance indicators of which is the best. At this point it doesnot support cross validaton easily.
We can potentially look into adding additonal features to this, or come up with new operators
Here is what the scenario we had in mind during building this operator
1) Lets say you build a model (cross validated) on January 1 and start using it in production.
2) You have a need to determine if this model is performing well in real world or not and update automatically
3) So you can schedule to retrain a group of model on latest data everyday/every month etc.., and then feed this freshly generated models and the current one in production and test against a common data set to the new "Compare model" operator. If your freshly generated models are better the output will give you one those models. You can add a store to the mod output port and over write the production model.
If production model is itself better then the store will overwrite with the exisitng model
The log of various performance vector etc is to keep a track of all model performance over a period of time, so you can see if models are improving/degrading over time
Hope this helps.
You input is valubale so we can look into how we can incorporate that in next iteration fo the model