🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Why does the correlation matrix show the minority class when the majority class was selected

GeezerDocGeezerDoc Member Posts: 5 Contributor I
edited February 2020 in Help
I'm running a simple classification model to predict the presence or absence of heart disease based on multiple risk factors. When I run Automodel I specify the class of interest being the "presence of heart disease". After the algorithms have run I took a look at the correlation matrix and the obvious risk factors showed a negative correlation with heart disease. When I expanded the attribute column I now realize that the correlation matrix is based on the "absence of heart disease". That is why the results are counter-intuitive. Any idea why this might be? Thanka
Tagged:
Jasmine_

Best Answers

  • GeezerDocGeezerDoc Member Posts: 5 Contributor I
    Solution Accepted
    I will attach the Heart Prediction File and you can run it to predict "Heart Disease Present" and see what happens. I have also run it with the class as 0,1 but RapidMiner interpreted that as regression so had to tell it to do classification
    Jasmine_
  • GeezerDocGeezerDoc Member Posts: 5 Contributor I
    Solution Accepted
    @varunm1
    While I don't think your response answered my initial question it may have answered another question that I had and that was related to "one hot encoding". Are you saying that AutoModel automatically uses this technique when it sees categorical data? If that is so, there is not need to convert categorical data using a visual operator before uploading the data to AutoModel. Please elaborate and thanks
    Jasmine_

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207   Unicorn
    edited February 2020
    Hello @GeezerDoc

    All columns are set as regular attributes and then the nominal columns are one-hot encoded in automodel. So in the case of binary classification, all the values in target column related to one category is coded as 0 and the other category is coded as 1. So the output column has both classes represented either as 1 or 0. The category name you see in the attribute name is coded as 1 and another category is zero.

    @IngoRM might provide more info if needed.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    Jasmine_GeezerDoc
Sign In or Register to comment.