IF YOU ❤️ RAPIDMINER, PLEASE HELP US GET TO #1 AGAIN - VOTE IN KDNUGGETS POLL 2019! 🙏 🙏 🙏

different validation performance parameters in LOG?

Fred12Fred12 Member Posts: 344   Unicorn
edited November 2018 in Help

hi, 

I have several performance parameters for validation to choose in the log operator, see screenshot:

 

val.png

 

can someone explain to me where the difference is in the different performance operators? because I have only 1 performance operator in my design..

 

and can someone please tell me, if I should use the normal Performance operator for k-nn, or some cluster-performance operator? which is  better?

I would like to see possible cluster outliers and be able to tag my data points with the label class color... but my dataset has 20+ attributes, is that still possible to visualize k-nn somehow?

 

Tagged:

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,020  RM Data Scientist

    Dear Fred,

     

    i think there are various things mixed in one question.

     

    The various performance things in X-Val

    Are i think only placeholders if you use more than 1

     

    Placement of the Log operator

    Please be sure that the log operator is AFTER the operator it should log - in your case the X-Val. If you put it inside it cannot access the latest result of X-Val

     

    Performance

    You use k-NN to classify, so you should use one of the Performance Operators for classification. The key which measure to use is of course driven by your problem. Using a clustering measure does not make sense if you do classification

     

    Vizualizing 20 Dimensions

    It is simply not possible to have a look at 20 dimensions at once. You would need to reduce dimensions with techniques like a PCA, SOM or t-SNE.

     

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,523   Unicorn

    Regarding the different performance values:

     

    performance is the value of the main criterion, which you select in the performance operator inside the X-Validation.

    deviation is the standard deviation of this main criterion.

     

    performance1 to performance3 are referencing to the first three performance criterions selected in the Performance operator. So if you check accuracy and error in Performance (Classification), performance1 references accuracy and performance2 the error, as accuracy is the first checked criterion and error the second in the list.

     

    This is a major pain point in any training course I have given so far, so can't hurt to be precise here :)

     

    Greetings,

      Sebastian

  • Fred12Fred12 Member Posts: 344   Unicorn

    ok thanks, that helped a bit, but I am still confused..

     

    I am using a optimize parameter Grid, and inside a Backward elimination, an inside that a x-validation with W-REPTree for numeric dataset :

    test.PNG

     

    where should I use the log operator now? I Used one after the x-validation, another on after the backward-elimination, and another one after the optimize-grid operator...

    secondly, I still don't really understand the result of the log operator, regarding things like performance1, performance2, etc. because those are not the same as accuracy, classification error and so on:

     Unbenannt2.PNG

     my log(3) operator, the one after the backward-elimination, puts out different results than that after the x-val, of course:

    Unbenannt3.PNG

     but how does that work, after which loop will an entry be made in the log(3) operator?

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,020  RM Data Scientist

    Hi Fred,

     

    it always depend on what you want to do. In your case, you would like to log the performance the optimize is working on. So you log on the optimize returned by Backwards Elemination. This is the one to log.

     

    Be careful with overtraining!

     

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • Fred12Fred12 Member Posts: 344   Unicorn

    ok but I want to test the 3 Parameters M,V,N in REPTree against eachother, because I want to achieve a high accuracy in X-Validation...

     

    I am now a bit confused, which of the logged performance values, or accuracy or kappa-value should I use to see the best performance?

    Unbenannt.PNG

    the first line has accuracy of 82.8%, but performance is only 77.6%, what is performance now? I thought thats the main criterion, which is accuracy?

    and performance1 is 77.6%, that should be the same as accuracy because thats the first case to choose in the performance(classification) operator?

     

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,020  RM Data Scientist

    Which accuracy did you log there? Backwards Elmination?

     

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • Fred12Fred12 Member Posts: 344   Unicorn

    yes, log(3) is backward elimination, log(2) is x-validation

  • asem_kasem_k Member Posts: 1 Contributor I

    What if someone wants to log more than 3 performance values? i.e., has checked more than 3 metrics and wants to log all of them, not only first 3.

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,523   Unicorn

    In that (very rare) case you can still use Performance to Data to transform the performance into a data set and handle it yourself. You could attach the current parameter settings using Generate Attributes param function and collect all the data sets in one of the usual ways.

    We usually use the Indexed Collections of our Jackhammer extension, that not only collect the objects but also indexing them with an arbitrary number of attribute/value pairs, so that you can access a specific object later by providing its index values. But also gut to have a match between parameters -> performance.

     

    Greetings,

     Sebastian

    sgenzer
Sign In or Register to comment.