Which performance of those operators is the now valid one?

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help

hi,

in my process, I have a optimize Parameter operator, inside it a X-Validation with MetaCost, Adaboost and WREP Tree...(picture):

Unbenannt5.PNG

I use different parameters for M between 2 and 5 and V between 0.001 and 0.1 (3 and 5 steps).

In the results perspective from the log operator (That comes just after  the X-Validation operator), I get different values for performance:

Unbenannt6.PNG

The thing is, I don't know which performance I should use, or which is representative,the kappa and performance column is from the performance (Classification) operator which is inside the X-Validation, (besides, what does "main Criterion"  inside the Performance(Classification) operator mean?).

The val_perf column is from the X-Validation parameter with value "performance". The val_perf3 is from X-Validation with performance3... I asked the question before, but I'm not sure if I understood that correct, what does "performance,performance1, performance2, performance3" in the X-Validation mean (see screenshot)?

Unbenannt10.png

 

and finally, I got the performance from "Optimize Parameter Grid" operator:

 

 Unbenannt9.PNG

 

so which of the 3 performances are the most "representative" now for my dataset? that from Performance(Classification) , X-Validation or Optimize Parameter operator? and should I use "Performance", or accuracy or kappa ? or what is best to decide if my model is a good one for data classification?

 

Screenshot from X-Validation:

Unbenannt8.PNG

Best Answer

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Hi,

     

    Use the performance of the Optimize Parameters operator - this is the one which is the result of the parameter settings you have been optimizing for so there is a direct relationship between the chosen parameters and the performance for this parameter set.

     

    The different performance for the cross validation are the main criterion (performance) as well as up to three other performance measurements you might have defined in the Performance operator you have used.  Typically you should only care about the main performance so going with "performance" for logging is fine.

     

    But in order to make a statement like "my model will be x% accurate" you should just go with the performance delivered by the Optimize Parameters.

     

    Cheers,

    Ingo

     

     

Answers

  • Fred12Fred12 Member Posts: 344 Unicorn

    can somebody explain  to me the different  performance values ? anybody got an idea?

Sign In or Register to comment.