Can we encode categorical data to numerical and then find the correlation in Rapidminer

ChaitraChaitra Member Posts: 3 Newbie
edited November 2019 in Help
Can we encode categorical data to numerical and then find the correlation in Rapidminer? if so please let me know the process

Answers

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    Hi Chaitra,

    I guess you can do it by hand, but I would rather run a correspondence analysis using R or Python. Most of the times this is not so important for a prediction task, as you can by-pass the problem using wrapper feature selection techniques (stepwise or evolutionary).

    Regards,
    Sebastian
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You should be very careful in doing this type of analysis.  There are operators that you can use to accomplish this task in RapidMiner easily (Nominal to Numerical and then Correlate) but whether it is meaningful depends on what type of categorical data you actually have and the way you do the conversion.  
    For example, if the data is actually nominal in nature, meaning it is not inherently ordered (think of things like colors or names) then a simple numerical replacement (where each nominal category is given a successive integer value) is actually very misleading.  That type of numerical conversion is only appropriate when the nominal categories correspond to some kind of ordered scale (similar to a Likert scale).  For other nominal data, you would want to do dummy coding conversion, which takes each nominal value and turns it into a zero/one variable (called a dummy code) and then you can run a correlation analysis on those attributes.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    For other nominal data, you would want to do dummy coding conversion, which takes each nominal value and turns it into a zero/one variable (called a dummy code) and then you can run a correlation analysis on those attributes.

    This is BTW what the correlation matrix in RapidMiner's Auto Model is doing.  You can open the process and see how it is done on your data #noblackboxes :smiley: 

    Best,
    Ingo

Sign In or Register to comment.