The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

# "[Solved] Correlation Matrix

Dear all,

I thought the correlation matrix would provide weights according the attributes' capabilities to describe the label. However, I can even run the correlation without a label. But how to interpret the weights then?

Furthermore, I'm thought that there might be correlations that are shifted in time (timelag). Do you have a good approach to figure this out?

My first idea is to take an optimization parameter that varies the horizon of a windowing operator. For each run the correlation matrix is applied. In the end I take horizon that offers the best correlation.

That should work but it appears to be a little complicated and I'm curious whether there is a better way to do this.

All the best

Sachs

I thought the correlation matrix would provide weights according the attributes' capabilities to describe the label. However, I can even run the correlation without a label. But how to interpret the weights then?

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<process version="5.2.003">

<context>

<input/>

<output/>

<macros/>

</context>

<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">

<process expanded="true" height="447" width="413">

<operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>

<operator activated="true" class="select_attributes" compatibility="5.2.003" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">

<parameter key="attribute_filter_type" value="single"/>

<parameter key="attribute" value="label"/>

<parameter key="invert_selection" value="true"/>

<parameter key="include_special_attributes" value="true"/>

</operator>

<operator activated="true" class="correlation_matrix" compatibility="5.2.003" expanded="true" height="94" name="Correlation Matrix" width="90" x="313" y="30"/>

<connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>

<connect from_op="Select Attributes" from_port="example set output" to_op="Correlation Matrix" to_port="example set"/>

<connect from_op="Correlation Matrix" from_port="weights" to_port="result 1"/>

<portSpacing port="source_input 1" spacing="0"/>

<portSpacing port="sink_result 1" spacing="0"/>

<portSpacing port="sink_result 2" spacing="0"/>

</process>

</operator>

</process>

Furthermore, I'm thought that there might be correlations that are shifted in time (timelag). Do you have a good approach to figure this out?

My first idea is to take an optimization parameter that varies the horizon of a windowing operator. For each run the correlation matrix is applied. In the end I take horizon that offers the best correlation.

That should work but it appears to be a little complicated and I'm curious whether there is a better way to do this.

All the best

Sachs

Tagged:

0

## Answers

130Contributor IIAn update on the correlation matrix:

The grid shows the correlation coefficient between two attributes. However, the weights provided by this operator appear to be illogical to me.

a1 a2 a3

a1 1,00 0,35 0,73

a2 0,35 1,00 0,11

a3 0,73 0,11 1,00

To calculate the weights it sums up all values of a row (e.g. 1 + 0,35 + 0,73 = 2,08)

Then this sum is being devided by the number of attributes (2,08 / 3 = 0,693)

This result is then substracted from 1 (1 - 0,693 = 0,307)

At first I am wondering that the correlation of an attribute to itself is being considered (which is always 1).

Furthermore, by substracting from 1 the weights become higher for those attributes that show a lower correlation.

I suggest using the "weight by correlation" operator instead.

PS: If you run correlation matrix with a label it will do basically the same as described above but without considering the label attribute (= temporary removal of the attribute for this operation).

Cheers

Sachs