RapidMiner now offering a 30 day free trial of RapidMiner Studio Large! Learn more

Which performance of those operators is the now valid one?


Which performance of those operators is the now valid one?


in my process, I have a optimize Parameter operator, inside it a X-Validation with MetaCost, Adaboost and WREP Tree...(picture):


I use different parameters for M between 2 and 5 and V between 0.001 and 0.1 (3 and 5 steps).

In the results perspective from the log operator (That comes just after  the X-Validation operator), I get different values for performance:


The thing is, I don't know which performance I should use, or which is representative,the kappa and performance column is from the performance (Classification) operator which is inside the X-Validation, (besides, what does "main Criterion"  inside the Performance(Classification) operator mean?).

The val_perf column is from the X-Validation parameter with value "performance". The val_perf3 is from X-Validation with performance3... I asked the question before, but I'm not sure if I understood that correct, what does "performance,performance1, performance2, performance3" in the X-Validation mean (see screenshot)?



and finally, I got the performance from "Optimize Parameter Grid" operator:




so which of the 3 performances are the most "representative" now for my dataset? that from Performance(Classification) , X-Validation or Optimize Parameter operator? and should I use "Performance", or accuracy or kappa ? or what is best to decide if my model is a good one for data classification?


Screenshot from X-Validation:



Re: Which performance of those operators is the now valid one?

can somebody explain  to me the different  performance values ? anybody got an idea?

RM Staff
RM Staff

Re: Which performance of those operators is the now valid one?



Use the performance of the Optimize Parameters operator - this is the one which is the result of the parameter settings you have been optimizing for so there is a direct relationship between the chosen parameters and the performance for this parameter set.


The different performance for the cross validation are the main criterion (performance) as well as up to three other performance measurements you might have defined in the Performance operator you have used.  Typically you should only care about the main performance so going with "performance" for logging is fine.


But in order to make a statement like "my model will be x% accurate" you should just go with the performance delivered by the Optimize Parameters.






How to load processes in XML from the forum into RapidMiner: Read this!
ezCater's RapidMiner Journey