Options

"Correlation Formula"

michaelglovenmichaelgloven RapidMiner Certified Analyst, Member Posts: 46 Guru
edited May 2019 in Help

why does the Correlation formula in RapidMiner use the bias correction of n-1? I was expecting, for example, a pairwise table of two attributes of mutually exclusive 0 and 1 to correlate as -1, yet the RapidMIner formula shows correlation of this data with negative values from 0 to -.6? From a practical standpoint, I'm not sure how to defend these partial correlations on two attributes which are either 0 or 1. The correlation should always be -1.

Tagged:

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi Michael,

     

    First of all: it was great to meet you at Wisdom!  I truly loved your presentation!

     

    Now to your question: I am not sure if I got you right.  I created a little test process generating a data set with mutually exclusive 0 and 1 values and got -1 correlation for them.  I tried both the Correlation Matrix operator and also the correlations calculated by Auto Model.  Can you explain what you tried so that we can spot the difference?

     

    Many thanks,

    Ingo

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="number_examples" value="500"/>
    <parameter key="number_of_attributes" value="1"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="A" value="if(rand()&gt;0.5,1,0)"/>
    <parameter key="B" value="if([A]==0,1,0)"/>
    <parameter key="Target" value="[A]"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="9.0.002" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="A|B|Target"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.0.002" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
    <parameter key="attribute_name" value="Target"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:correlation_matrix" compatibility="9.0.002" expanded="true" height="103" name="Correlation Matrix" width="90" x="581" y="34"/>
    <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Correlation Matrix" to_port="example set"/>
    <connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
    <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.