Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Reverse Mapping of PCA

MrFuryMrFury Member Posts: 2 Contributor I
edited September 2019 in Help
Hi there,  I am fairly new to using Rapidminer and a little stuck.

I have applied PCA on a dataset and retained 2 of the principal components. 
I then apply the PCA model to a new dataset to calculate the scores. 

I would like to then reverse map the scores to the input variables.
and then do a comparison of the original input variables vs. the newly calculated variables from the reverse mapping.

I am trying to use the errors to do online fault diagnosis. 

How can I do the reverse mapping of the scores in Rapidminer?

Thanks

MrFury

Answers

  • wesselwessel Member Posts: 537 Maven
    Hi,

    I remember doing this in the past, by feeding the PCA a row with all 1's.
    Then using the result in a 'rename'.

    Is your goal to get the names correct?
    Or to apply a PCA model on unseen data?

    Best regards,

    Wessel
  • MrFuryMrFury Member Posts: 2 Contributor I
    Feeding it 1's.... I will have to think about that one. 

    The goal is to do fault diagnosis on some sensor data I am getting from a piece of equipment.  So apply the PCA model on unseen time series data.

    So I am trying to do similar as described in this article:  [http://www.wseas.us/e-library/conferences/2010/Merida/CIMMACS/CIMMACS-20.pdf]

    Step 1:  Capture most of the natural process variance with PCA on training data.  The residual being the "noise"
    Step 2:  On a new set of data, apply the PCA model to the data to calculate the scores
    Step 3:  Monitor the T2 or Q statistic to indicate a fault (since the error will increase if seeing something different to what it was modeled on)
    Step 4:  By backwardly mapping the scores into the input variables one can calculate the error for each variable (E = x - x')

    Thanks

    Pieter

  • wesselwessel Member Posts: 537 Maven
    In this case you want to study the results of PC by hand.
    Then hand 'craft' features.

    PC1 is typically the average value of the series.
    PC2 is typically the difference between the first part of the series and the second part of the series.

    By hand crafting you can create soft margins.
    So you can specify that the middle part of the series gets a value close to zero.
    Where the second last part gets a value quickly building up to -1.
    And the first part starting from 1 and quickly dropping to near 0 when the middle part is reached.

    As a side note, you really want to start supervised learning if possible.
    Using unsupervised PCA will always give you vague answers.
Sign In or Register to comment.