"PCA Option To Add PCs as Special Attributes"

dragoljubdragoljub Member Posts: 241 Maven
edited June 2019 in Help
Hi Guys,

I often run PCA on large datasets to get an idea of the high level data structure. In most cases I just want to plot the top 2 PCs.

It would be great if the PC operator had an option to add principle components as 'special' attributes to the original data.

When I try to 'Join' the 'exa' and 'ori' outputs of PCA I get this strange error (RM 5.08).

Jun 30, 2010 10:19:47 AM WARNING: Error creating renderer: java.lang.ArrayIndexOutOfBoundsException: DataRow: table index 60 of Attribute 64:Continuity_All_lo:[email protected]_INP_DRX[1] is out of bounds.

So I cant view the data and it cant be passed to subsequent operators. It should work since the ID column is preserved in both 'exa' and 'ori' outputs.

Thanks,  ???
-Gagi

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Gagi,
    this process works for me without any problems:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.0" expanded="true" name="Process">
        <process expanded="true" height="455" width="701">
          <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="sum classification"/>
            <parameter key="number_of_attributes" value="6"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="5.0.8" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
          <operator activated="true" class="principal_component_analysis" compatibility="5.0.8" expanded="true" height="94" name="PCA" width="90" x="313" y="30">
            <parameter key="dimensionality_reduction" value="fixed number"/>
            <parameter key="number_of_components" value="2"/>
          </operator>
          <operator activated="true" class="join" compatibility="5.0.8" expanded="true" height="76" name="Join" width="90" x="447" y="30"/>
          <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="PCA" to_port="example set input"/>
          <connect from_op="PCA" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="PCA" from_port="original" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    This should be pretty equal to yours. Don't know where your problems result from?
    Since we want to keep each operator of RapidMiner as simple as possible, we prefer having it this way instead of adding more data (which might cause memory problems anyway) as you suggested. We rather should try to find this bug :)

    Greetings,
      Sebastian
Sign In or Register to comment.