Options

# "Correlation Formula"

RapidMiner Certified Analyst, Member Posts: 46 Guru
edited May 2019 in Help

why does the Correlation formula in RapidMiner use the bias correction of n-1? I was expecting, for example, a pairwise table of two attributes of mutually exclusive 0 and 1 to correlate as -1, yet the RapidMIner formula shows correlation of this data with negative values from 0 to -.6? From a practical standpoint, I'm not sure how to defend these partial correlations on two attributes which are either 0 or 1. The correlation should always be -1.

Tagged:

• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

Hi Michael,

First of all: it was great to meet you at Wisdom!  I truly loved your presentation!

Now to your question: I am not sure if I got you right.  I created a little test process generating a data set with mutually exclusive 0 and 1 values and got -1 correlation for them.  I tried both the Correlation Matrix operator and also the correlations calculated by Auto Model.  Can you explain what you tried so that we can spot the difference?

Many thanks,

Ingo

`<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">  <context>    <input/>    <output/>    <macros/>  </context>  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">    <process expanded="true">      <operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">        <parameter key="number_examples" value="500"/>        <parameter key="number_of_attributes" value="1"/>      </operator>      <operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">        <list key="function_descriptions">          <parameter key="A" value="if(rand()&gt;0.5,1,0)"/>          <parameter key="B" value="if([A]==0,1,0)"/>          <parameter key="Target" value="[A]"/>        </list>      </operator>      <operator activated="true" class="select_attributes" compatibility="9.0.002" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">        <parameter key="attribute_filter_type" value="subset"/>        <parameter key="attributes" value="A|B|Target"/>        <parameter key="include_special_attributes" value="true"/>      </operator>      <operator activated="true" class="set_role" compatibility="9.0.002" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">        <parameter key="attribute_name" value="Target"/>        <parameter key="target_role" value="label"/>        <list key="set_additional_roles"/>      </operator>      <operator activated="true" class="concurrency:correlation_matrix" compatibility="9.0.002" expanded="true" height="103" name="Correlation Matrix" width="90" x="581" y="34"/>      <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>      <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>      <connect from_op="Set Role" from_port="example set output" to_op="Correlation Matrix" to_port="example set"/>      <connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>      <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>      <portSpacing port="source_input 1" spacing="0"/>      <portSpacing port="sink_result 1" spacing="0"/>      <portSpacing port="sink_result 2" spacing="0"/>      <portSpacing port="sink_result 3" spacing="0"/>    </process>  </operator></process>`