Options

"[Solved] How to work on Correlation Matrix results"

TomTomTomTom Member Posts: 3 Contributor I
edited June 2019 in Help
Hello,

I have made a correlation matrix within my process and would like to use the Matrix results in this process. Unfortunately the only operators I've found which can use "mat" as input are the ones in Reporting addon. So I can write an Excel File, for example. I couldn't find a way to use standard operators like "Filter examples" or  "Select" with the Matrix as input. Is there a way to filter the matrix for certain values without writing it to a file and reading the file within the process (This wouldn't be fast enough, because I have very very large data sets)?

I've tried also another way: I've used "Remove correlated attributes" instead "Correlation Matrix", and set filter relation to "less", to get attribute pairs which are correlating each other, but the results are confusing me:

Sometimes the result of "Remove correlated attributes" is a resultset with just one column. If I have a Result Set with two attributes: A and B and also some other attributes and column A has a high correlation to another column. Why is "Remove correlated attributes" just returning one of both columns? I would expect it to return both columns, because Correlation is a bidirectional relationship.

It would be really great, if anyone could help on this issue.
Tagged:

Answers

  • Options
    TomTomTomTom Member Posts: 3 Contributor I
    As I haven't found a clean solution, I have chosen to write Correlation Matrix to file and reload it again. Very dirty, but it works. Please let me know, if there's a cleaner solution.

    Here is an example process of what I have done:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="656" width="681">
          <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="20"/>
          </operator>
          <operator activated="true" class="correlation_matrix" compatibility="5.3.000" expanded="true" height="94" name="Correlation Matrix" width="90" x="179" y="30"/>
          <operator activated="true" class="write_as_text" compatibility="5.3.000" expanded="true" height="76" name="Write as Text" width="90" x="313" y="30">
            <parameter key="result_file" value="C:\TEMP\test.csv"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.3.000" expanded="true" height="60" name="Read CSV" width="90" x="179" y="165">
            <parameter key="csv_file" value="C:\TEMP\test.csv"/>
            <parameter key="column_separators" value="\s"/>
            <parameter key="comment_characters" value=""/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.polynominal.attribute"/>
              <parameter key="1" value="att2.true.polynominal.attribute"/>
              <parameter key="2" value="att3.true.polynominal.attribute"/>
              <parameter key="3" value="att4.true.polynominal.attribute"/>
              <parameter key="4" value="att5.true.polynominal.attribute"/>
              <parameter key="5" value="att6.true.polynominal.attribute"/>
              <parameter key="6" value="att7.true.polynominal.attribute"/>
              <parameter key="7" value="att8.true.polynominal.attribute"/>
              <parameter key="8" value="att9.true.polynominal.attribute"/>
              <parameter key="9" value="att10.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.3.000" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="165">
            <parameter key="first_example" value="3"/>
            <parameter key="last_example" value="12"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
          <connect from_op="Correlation Matrix" from_port="matrix" to_op="Write as Text" to_port="input 1"/>
          <connect from_op="Write as Text" from_port="input 1" to_port="result 1"/>
          <connect from_op="Read CSV" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    there is currently no good way to automatically process the output of the Correlation Matrix operator. We already have an internal feature request to be able to convert the matrix object to an example set.

    Concerning the Remove Correlated Attributes: if you have a set of correlated attributes, this operator *should* remove all but one of them (not all of them, that way the complete information would be lost).  Does that explain your observations, or did I misunderstand something of your description?

    Best regards,
    Marius
  • Options
    TomTomTomTom Member Posts: 3 Contributor I
    Hi Marius,

    Thanks for your response. Yes, that explains my observations. As I need some other columns from the matrix I have chosen to write the matrix results to Hard Drive and reload it again as CSV. That works quite well for me, now.

    Best regards to Dortmund
  • Options
    qwertzqwertz Member Posts: 130 Contributor II

    Please be aware that the "write as text" operator will only write the first 20 attributes!! This is weird as I could not find any hints in the documentation about this. However, you can use a similar work-around with the report operator which is also explained in the forum. (see http://rapid-i.com/rapidforum/index.php?topic=2081.0)

    Piece of code which shows that not all attributes are written:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="6.0.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="30"/>
          </operator>
          <operator activated="true" class="correlation_matrix" compatibility="6.0.003" expanded="true" height="94" name="Correlation Matrix" width="90" x="179" y="30"/>
          <operator activated="true" class="write_as_text" compatibility="6.0.003" expanded="true" height="76" name="Write as Text" width="90" x="313" y="30">
            <parameter key="result_file" value="C:\test.txt"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
          <connect from_op="Correlation Matrix" from_port="matrix" to_op="Write as Text" to_port="input 1"/>
          <connect from_op="Write as Text" from_port="input 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Cheers
    Sachs
Sign In or Register to comment.