Using the Correlation Matrix Operator Matrix Output

BrianTBrianT Member Posts: 5 Learner I
In my project, I'm trying to reduce the amount of correlation in my dataset. The standard way we do this is to look at all the pair-wise correlations of attribute, isolate those pairs above .95 (in absolute), and remove the attribute from the pair that has the lower correlation with the independent variable.

The Correlation Matrix operator provides this pair-wise table that I could use. However, I haven't been able to figure out how to wire the green output node into another operator that I could use. I'd appreciate any help in accessing that data in a process and not just in the results.
Tagged:

Best Answer

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi,

    if you just want to remove the correlated attributes, the Remove Correlated Attributes operator is ready for you.

    If you want to do it manually, check out the "Matrix to ExampleSet" operator in the Converters extension. It converts your matrix to a table according to the options you set. Then you could for example filter the list, use Data to Weights and Select by Weights to eliminate the unwanted attributes.

    Regards,
    Balázs

Answers

  • MarcoBarradasMarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, Member Posts: 272 Unicorn
    Hi @BrianT

    You could use the remove correlated attributes operator.

    The image I show is from the example you can get from the help menu.


  • BrianTBrianT Member Posts: 5 Learner I
    Hi @MarcoBarradas, @BalazsBarany,

    The reason I haven't wanted to use the Remove Correlated Attributes operator is that I haven't been able to figure out how it chooses which of the two correlated attributes to remove. In the tutorial, it seems to drop the one with the higher correlation weight. I want to do the opposite but there isn't an option that allows for that. Hence, the need for a workaround.

    Unfortunately I can't access the Converters extension. It's not your problem but I am baffled by the fact that it's not possible to extract information from the correlation matrix without a third party extension.
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi @BrianT,

    it's not a third party extension, it is developed by RapidMiner people. It's a bit of a testbed for testing which operators should go into core. 

    What's your problem with accessing the extension? If your Studio fails to access the Marketplace, you can download from the web and put it into you RapidMiner Studio installation folder in lib/plugins.
    https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_converters

    Two attributes have the same correlation, what's your problem with the order of the removal? You can try different settings of the "attribute order" parameter of Remove Correlated Attributes and check if they do what you need.

    Regards,
    Balázs
  • BrianTBrianT Member Posts: 5 Learner I
    Hi @BalazsBarany,

    I'm circling back to answer your question since my problem has been resolved. This is more for the case where someone else has the same issue I had, specific though it was. What I really wanted to do was look at two correlated attributes and remove the one that had a lower correlation with the target variable. I did eventually figure out what the "attribute order" parameter does, like you mentioned. So to get the result I wanted, I created a subprocess that sorted my attributes in order of their correlation with my target variable.

    Thanks,
    Brian
Sign In or Register to comment.