Correlation Matrix

maccten
maccten New Altair Community Member
edited November 2024 in Community Q&A
Hi,

I have a large data set with many attributes
I would like to see how closely the attributes are correlated but because of the sheer number of them I'm only interested in attributes that are correlated about 40%
Is there a way to do this for example using a filter of some description. I know you can remove correlated attributes and select by weights but are not what i need as im interested in the high correlations

Thank you for your time

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Andrew2
    Andrew2 New Altair Community Member
    Hello

    There are options like "top k" and "top p%" in the Select by Weights operator that might help.

    regards

    Andrew
  • maccten
    maccten New Altair Community Member
    Hi Andrew

    Thanks for the quick reply. I ran it this morning but i don't think this is what I'm looking for
    What i need is the pairwise table so i can specifically say there is a 50% correlation between Attribute A and B but a Negative correalation between A and C
    Do you know if you can filter the actual matrix?

    Thanks
  • maccten
    maccten New Altair Community Member
    Hi All

    Is there perhaps a method to export the pairwise table into a CSV file or generate a report based off of it?
    Has anyone tried it before
    If it was in a database it would be simple case of selecting the rows where the correlation is above a certain amount

    Thanks
  • Andrew2
    Andrew2 New Altair Community Member
    Hello

    A groovy script would be able to do it. I could probably do that in return for beer or money  ;D

    Alternatively, I'm having a think about the possibility of calculating the correlation in a process without using the built in operators. That way would let you make an example set that could be filtered as you like.

    regards

    Andrew
  • maccten
    maccten New Altair Community Member
    I thought this link provided the answer http://www.myexperiment.org/workflows/1279.html

    But unfortunately, it doesn't provide a pairwise table and the matrix in question is 5000 attributes in scope so exporting it to excel means cutting off a good portion of it

    Il keep the beer money in mind of course :), as soon as the next pay check comes around
  • MariusHelf
    MariusHelf New Altair Community Member
    Have a look at the configuration of the Report operator: you should be able to configure Pairwise Table as output format.

    Have a look at process below:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="correlation_matrix" compatibility="5.3.008" expanded="true" height="94" name="Correlation Matrix" width="90" x="179" y="30"/>
          <operator activated="true" class="reporting:generate_report" compatibility="5.3.000" expanded="true" height="76" name="Generate Report" width="90" x="313" y="30">
            <parameter key="report_name" value="test"/>
            <parameter key="format" value="Excel"/>
            <parameter key="excel_output_file" value="C:\Users\jdoe\Desktop\test.xls"/>
          </operator>
          <operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="60" name="Report" width="90" x="447" y="30">
            <parameter key="report_name" value="test"/>
            <parameter key="specified" value="true"/>
            <parameter key="reportable_type" value="Numerical Matrix"/>
            <parameter key="renderer_name" value="Pairwise Table"/>
            <list key="parameters">
              <parameter key="min_row" value="1"/>
              <parameter key="max_row" value="2147483647"/>
              <parameter key="min_column" value="1"/>
              <parameter key="max_column" value="2147483647"/>
            </list>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
          <connect from_op="Correlation Matrix" from_port="matrix" to_op="Generate Report" to_port="through 1"/>
          <connect from_op="Generate Report" from_port="through 1" to_op="Report" to_port="reportable in"/>
          <connect from_op="Report" from_port="reportable out" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • maccten
    maccten New Altair Community Member
    Hi Marius

    This works :)
    However i have one last problem in relation to this
    My pair wise table is going to generate roughly 25 million rows which is not exportable using a report
    Is there anyway to filter the matrix/pairwise table so that say only attributes with a certain correlation are exported for example only return attributes with 50% or more correlation?

    Thanks
  • MariusHelf
    MariusHelf New Altair Community Member
    Unfortunately, this is not possible. To solve the problem once and forever, we have an internal ticket requesting to convert the matrix into a normal example set, but we don't have a schedule for it yet.
  • maccten
    maccten New Altair Community Member
    Thanks Marius ver much for the feedback

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.