RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

correlation matrix

Sunnyboy_nhSunnyboy_nh Member Posts: 10 Newbie
edited June 2 in Help
how can  I use the correlation matirx operator  between attributes with different value types specifically  between a nominal and a real/integer  attribute?
Tagged:

Best Answer

Answers

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 251  RM Research

    the Correlation Matrix operator can only handle numerical attributes, this would require a rank correlation.
    But the operator works fine with any numerical values, so there is no need to convert between real and integers.

    Best,
    David
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,530   Unicorn
    What would it mean to calculate the correlation between a nominal and a numerical attribute?
    You should convert nominal to numerical first, generally using dummy coding, and then compute the correlation on the resulting data.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Sunnyboy_nhSunnyboy_nh Member Posts: 10 Newbie
    Thanks Telcontar120 for reply.
    Yes sure i also did a conversion  to binominal first  but doing that I get in the result matrix of correlation matrix instead of one converted attribute all of a sudden multiple of that same attribute each time wirh slightly different values which  my matrix rows and column number which i don'need and i don't underestand .....!
    What do exactly mean wirh dummy coding ?
  • Sunnyboy_nhSunnyboy_nh Member Posts: 10 Newbie
    Hi David A.

    Yes you are right. I don't want to do correlstion between two integer and real but to do correlation between integer or real # and polynominal ?

  • Sunnyboy_nhSunnyboy_nh Member Posts: 10 Newbie
    Thanks Telcontar120 for your further explanstion on dummy coding.....hence I have used there another option " unique integer" instead of dummy coding and prompt it wieked tge way I wanted without extra unwanted new values of the same atteibute.
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,530   Unicorn
    You should be very careful using the unique integer coding option!  If your underlying nominal attribute is not scalar and ordinal, then this method won't make a lot of sense. For example, imagine you have a nominal attribute with the values (red, blue, green, purple).  Unique integer coding will internally transform this into a numerical as 1=blue, 2=green, 3=purple, 4=red.  Would it make sense to then use this integer in any kind of numerical calculation such as a correlation?  Certainly not! So while it may be annoying that dummy coding creates extra attributes, this really is the correct way of handling nominal attributes that are not inherently in some kind of numerically ordered categories.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    lionelderkrikor
Sign In or Register to comment.