RapidMiner

‎04-03-2017 08:57 AM

Either during the prototyping phase of a project or at the later, model lifecycle management stage, there is a need to build models on the same data that employ different algorithms in order to compare the performance of those algorithms agains the data in it's current state and volume.

 

In the prototyping stage, we need to establish which is the best algorithm for our project.

 

When managing the process lifecylce, we need to deal with effects of concept drift (https://en.wikipedia.org/wiki/Concept_drift), i.e. the variability in importance of certain variables over time and the variability of values which may lead to a model built in the past not performing as well in the future. This may be helped by retraining the algorithm or by finding a different one that performs better with the data as it has evolved.

 

To achieve the above, we can use the "Compare Models" operator to build models by different algortithms, compare their performance and keep a record of the performance and parameters

 

model management.png

The best way to learn about this capability is to open the tutorial process which is bundled with the operator. Just drop the Compare Models operator onto the canvas, switch to the Help panel and navigate to the tutorial process. You should see the same process as above.

 

The data used comes from one of the bundled data sets and has "Play" set as the target variable:

model management 1.png

 

Then a Multiply operator is used and three different predictive models are built, in this case, using default parameters but those can be optimised if necessary.

The models produced are collected by the Collection operator which passes them to the "Compare Models" operation for Comparison.

 

model management 2.png

 

The "Compare Models" operator requires two parameters, a location to save the results and a date format and now we can get to the most interesting part, the output:

 

model management 3.png

 

We can obtain models and their performances, neatly arranged under folders that bear the date and time in the format we specified.

 

There is also a performance record as a data source which can then be imported into a process to be used as input for further actions such as those from Process Control operators undel Utilities.

 

model management 4.png

 

Comments
RM Certified Expert
RM Certified Expert

Hi Kostas,

Where can I find the Compare Models operator? Thanks!

RM Certified Expert
RM Certified Expert

I believe it is a separate extension "Model Management" available for free in the marketplace.