different validation performance parameters in LOG?

Fred12 · July 2016

hi,

I have several performance parameters for validation to choose in the log operator, see screenshot:

can someone explain to me where the difference is in the different performance operators? because I have only 1 performance operator in my design..

and can someone please tell me, if I should use the normal Performance operator for k-nn, or some cluster-performance operator? which is better?

I would like to see possible cluster outliers and be able to tag my data points with the label class color... but my dataset has 20+ attributes, is that still possible to visualize k-nn somehow?

MartinLiebig · July 2016

Dear Fred,

i think there are various things mixed in one question.

The various performance things in X-Val

Are i think only placeholders if you use more than 1

Placement of the Log operator

Please be sure that the log operator is AFTER the operator it should log - in your case the X-Val. If you put it inside it cannot access the latest result of X-Val

Performance

You use k-NN to classify, so you should use one of the Performance Operators for classification. The key which measure to use is of course driven by your problem. Using a clustering measure does not make sense if you do classification

Vizualizing 20 Dimensions

It is simply not possible to have a look at 20 dimensions at once. You would need to reduce dimensions with techniques like a PCA, SOM or t-SNE.

~Martin

land · July 2016

Regarding the different performance values:

performance is the value of the main criterion, which you select in the performance operator inside the X-Validation.

deviation is the standard deviation of this main criterion.

performance1 to performance3 are referencing to the first three performance criterions selected in the Performance operator. So if you check accuracy and error in Performance (Classification), performance1 references accuracy and performance2 the error, as accuracy is the first checked criterion and error the second in the list.

This is a major pain point in any training course I have given so far, so can't hurt to be precise here

Greetings,

Sebastian

Fred12 · July 2016

ok thanks, that helped a bit, but I am still confused..

I am using a optimize parameter Grid, and inside a Backward elimination, an inside that a x-validation with W-REPTree for numeric dataset :

where should I use the log operator now? I Used one after the x-validation, another on after the backward-elimination, and another one after the optimize-grid operator...

secondly, I still don't really understand the result of the log operator, regarding things like performance1, performance2, etc. because those are not the same as accuracy, classification error and so on:

my log(3) operator, the one after the backward-elimination, puts out different results than that after the x-val, of course:

but how does that work, after which loop will an entry be made in the log(3) operator?

MartinLiebig · July 2016

Hi Fred,

it always depend on what you want to do. In your case, you would like to log the performance the optimize is working on. So you log on the optimize returned by Backwards Elemination. This is the one to log.

Be careful with overtraining!

~Martin

Fred12 · July 2016

ok but I want to test the 3 Parameters M,V,N in REPTree against eachother, because I want to achieve a high accuracy in X-Validation...

I am now a bit confused, which of the logged performance values, or accuracy or kappa-value should I use to see the best performance?

the first line has accuracy of 82.8%, but performance is only 77.6%, what is performance now? I thought thats the main criterion, which is accuracy?

and performance1 is 77.6%, that should be the same as accuracy because thats the first case to choose in the performance(classification) operator?

MartinLiebig · July 2016

Which accuracy did you log there? Backwards Elmination?

~Martin

Fred12 · July 2016

yes, log(3) is backward elimination, log(2) is x-validation

asem_k · November 2017

What if someone wants to log more than 3 performance values? i.e., has checked more than 3 metrics and wants to log all of them, not only first 3.

land · November 2017

In that (very rare) case you can still use Performance to Data to transform the performance into a data set and handle it yourself. You could attach the current parameter settings using Generate Attributes param function and collect all the data sets in one of the usual ways.

We usually use the Indexed Collections of our Jackhammer extension, that not only collect the objects but also indexing them with an arbitrary number of attribute/value pairs, so that you can access a specific object later by providing its index values. But also gut to have a match between parameters -> performance.

Greetings,

Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

different validation performance parameters in LOG?

Answers