RapidMiner

Correlation between dichotomous and continuous variables

SOLVED
Contributor

Correlation between dichotomous and continuous variables

I'm using the operator "Correlation Matrix" on rapidminer, which I believe uses Pearson Correlation, and the operator is able to calculate correlations for every  variable type, including binominals (dichotomous) and polinominals. 

I would like to know: how exactly is the operator calculating the correlation for example between a binominal and a numerical attribute? Wouldn't a pearson correlation only allow numerical variables? Does it simply convert binominals to 0 and 1, or is it  doing something else?

Thanks in advance,
Filipe G.B.

3 REPLIES
Elite III

Re: Correlation between dichotomous and continuous variables

I believe it is doing sequential integer coding for any nominal attributes. This is of course highly questionable for polynominal data in terms of correlation interpretability, but for binominal data it does make sense.
Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Highlighted
Moderator

Re: Correlation between dichotomous and continuous variables

Hey,

 

RapidMiner uses internally a mapping to integers for all nominal types. This mapping is used for the corellation. It's somewhat statistical not too good. That's why we throw a problem if you do it.

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Contributor

Re: Correlation between dichotomous and continuous variables

Thank you for you answer