Correlation Matrix does not include label attribute

kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn
edited December 2018 in Help

Hi rapidminers, 

 

Is there any reason for the fact that correlation matrix does not include label attribute (in case it is present in a dataset) and shows only regular ones? 

 

Without label:

 

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Titanic"/>
</operator>
<operator activated="false" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="238">
<parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="7.1.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="85"/>
<connect from_op="Retrieve Titanic" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

With label: 

 

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="false" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Titanic"/>
</operator>
<operator activated="true" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="238">
<parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="7.1.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="85"/>
<connect from_op="Retrieve Titanic Training" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:

Answers

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    As an additional concern about correlation matrix.

    What might be the reason that one of attributes shows mostly '?' (NaN) in the matrix, even with a correlation to itself?

     

    Screenshot 2017-10-19 23.40.51.png 

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Hi @kypexin - I do not have a PhD in Data Science like other folks here but I will simply say that not having the label in a correlation matrix makes perfect sense to me.  In a correlation matrix, you are simply finding r (r^2 if you check the box) for nC2   numerical attributes to see potential correlations two-by-two.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="correlation_matrix" compatibility="7.6.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="85"/>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
    <connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
    <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you simply want the correlation of your attributes with the label, you can use "Weight by Correlation" to generate that.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Thanks @sgenzer @Telcontar120, it's all clear. 

    Maybe I actually had to put the question in a different way: couldn't the label be automatically treated as a regular attribute in the process of matrix computation, at least by means of an option; it's clear to me that I can just turn it into a regular attribute manually before building the matrix. But that's more like a philosophical question :)  

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    ah yes @kypexin I understand exactly what you mean.  It's a valid point.  I guess I always look at "labels" in RapidMiner as "special variables that should be treated separately from others".  Hence all the checkboxes that say "include special attributes".  So whether or not "include special attributes" should be in Correlation Matrix is a good question.  I will throw it on my "interesting suggestions from community users to the dev team" list.  :)


    Scott

     

Sign In or Register to comment.