The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Correlation between dichotomous and continuous variables

filipedgbfilipedgb Member Posts: 2 Contributor I
edited December 2018 in Help

I'm using the operator "Correlation Matrix" on rapidminer, which I believe uses Pearson Correlation, and the operator is able to calculate correlations for every  variable type, including binominals (dichotomous) and polinominals. 

I would like to know: how exactly is the operator calculating the correlation for example between a binominal and a numerical attribute? Wouldn't a pearson correlation only allow numerical variables? Does it simply convert binominals to 0 and 1, or is it  doing something else?

Thanks in advance,
Filipe G.B.

Tagged:

Best Answer

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted
    I believe it is doing sequential integer coding for any nominal attributes. This is of course highly questionable for polynominal data in terms of correlation interpretability, but for binominal data it does make sense.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist

    Hey,

     

    RapidMiner uses internally a mapping to integers for all nominal types. This mapping is used for the corellation. It's somewhat statistical not too good. That's why we throw a problem if you do it.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    filipedgbfilipedgb Member Posts: 2 Contributor I

    Thank you for you answer

  • Options
    azziaty256azziaty256 Member Posts: 4 Contributor I

    Hi, I am a new user in RapidMiner 

    Actually I have 31 attributes with 10K of instances.. I want to make correlation matrix in order to make a relationship between the attribute. The problem is I have many types of data which is nominal, polynominal and numerical data..May I know what are the process of correlation matrix for many types of data?

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    As explained previously in this thread, typically "correlation analysis" only applies to numerical variables.  What would you expect the correlation coefficient for nominal data to tell you?

    If you want to use nominal data with correlation, you are better off recoding it as a series of binominal/dummy variables first.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.