🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Correlation in classification model - how to sort classes

AGrabowiczAGrabowicz Member Posts: 3 Contributor I
edited August 2019 in Help

Hello all,

 

I have a classification problem to solve. There are 10 classes (1, 2, 3, 4, ... , 10) to be predicted and I want to optimize my model parametres by highest correlation since in real life class 1 should have relatively similar characteristics to class 2 and at the same time very low similarity to class 10.

 

If I understand correctly in the Performance(Classification) operator correlation is calculated as follows:
Cov(L,P) / sqrt(V(L)*V(P))
where: P=prediction, L=label, V=Variance, Cov=Covariance.

 

However when I treat label classes 1, 2, 3 etc. as polynominals, RapidMiner gives them quite random integer index (based on which the correlation is later calculated) which I cannot control. Therefore correlation is not calculated properly.

 

Is there any way to force RapidMiner to treat polynominal label 1 as 1 (index), label 2 as 2 (index) etc.?

 

Thanks in advance!

Best Answer

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Solution Accepted

    Hi,

    sounds to me, that a cost based approach with non uniform cost matrix would be easier and safer as it would work in the way RapidMiner was designed for. Alternatively you can replace the nominal values AFTER prediction with numbers and calculate standard Performance (Regression) correlation.

     

    Greetings,

    Sebastian

    sgenzer

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,760   Unicorn

    I think the Map or Remap operator is what you will need.

    sgenzer
  • AGrabowiczAGrabowicz Member Posts: 3 Contributor I

    Hello Sebastian,

     

    Thank you for suggesting to convert nominal label and prediction to numerical value and then proceed with the performance (regression) operator. It seems like a immediate solution to the problem. However, can you elaborate more on the cost based approach?

     

    Thanks!

  • AGrabowiczAGrabowicz Member Posts: 3 Contributor I

    Actually I found the answer to "cost based approach" myself. Instead of using performance (classification) operator one could go for performance (costs) operator and set-up proper weights accordingly.

     

    Thank you anyway!

     

    Adam

    sgenzer
Sign In or Register to comment.