Loosing my ID field when using Principal Component Generator

KeithrKeithr Member Posts: 10 Contributor II
edited November 2018 in Help
Hi,

I'm using the pricipal component generator to combine some highly correlated variables in a classification problem, which works fine, but it's dropping the ID I have so there is no way to tie the classification result back to the actual customer ID.

The process I'm using is as follows:
ExampleSource (label, ID, 82 variables) -> AttributeFilter (label, ID, 3 variables) -> principalComponentGenerator (label, 1 variable).

What am I doing wrong?

Thanks in advance for your help.

Keith

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Keith,
    you probably forgot to set the ID attribute as special attribute. Every not special attributes are used within PCA and removed afterwards. You might define the ID attribute at the operator loading the data, if it provids an parameter "id_attribute" otherwise you can change the type lateron by using the ChangeAttributeRole operator:
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="ID"/>
            <parameter key="target_role" value="id"/>
        </operator>
    Greetings,
      Sebastian
  • KeithrKeithr Member Posts: 10 Contributor II
    Hi Sebastian,

    I think I'm setting the ID up correctly in the aml file, and it does show in the meta data view before I run the PCA as an ID.

    label  dropped                 binominal mode = Y (624) Y (624), N (624) 0.0
    id         hhId                         integer avg = 106,258,671.246 +/- 3,587,967.584 [100,731,555.000 ; 111,749,338.000] 0.0
    regular DELTA_1_SALES real avg = -33.475 +/- 43.687 [-100.000 ; 103.170] 0.0
    regular DELTA_1_TRIPS real avg = -22.330 +/- 48.379 [-100.000 ; 161.110] 0.0
    regular DELTA_1_CAT_PEN real avg = -30.810 +/- 38.007 [-100.000 ; 69.230] 0.0

    But after the PCA runs the ID disappears and all I have left is the label and the PCA variable.  It also renames the label from dropped to "label".

    label         label nominal mode = Y (624) N (624), Y (624) 0.0
    regular pc_1 real avg = -50.189 +/- 70.917 [-173.156 ; 160.344] 0.0

    aml file:
    <attributeset default_source="8051_D_75_C3_200810_Hurdle25_50pctDec_Train_5050.psv">

      <id
    name      ="hhId"
    sourcecol = "1"
    valuetype = "integer"
    />

      <label
    name      ="dropped"
    sourcecol = "2"
    valuetype = "binominal">
    <value>Y</value>
    <value>N</value>
    </label>

      <attribute
    name      ="baselineSales"
    sourcecol = "3"
    valuetype = "integer"
    />
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi Keith,

    the operator [tt]PrincipalComponentGenerator[/tt] is outdated, please use the operator [tt]PCA[/tt] instead. This operator outputs a model which can then be applied to the data using the [tt]ModelApplier[/tt]. This way, all special attributes (label and id) should be kept.

    Regards,
    Tobias
  • KeithrKeithr Member Posts: 10 Contributor II
    Hi Tobias,

    That did the trick.

    Thanks a lot for your help!

    Keith
Sign In or Register to comment.