Reverse Mapping of PCA

MrFury · November 2015

Hi there, I am fairly new to using Rapidminer and a little stuck.

I have applied PCA on a dataset and retained 2 of the principal components.
I then apply the PCA model to a new dataset to calculate the scores.

I would like to then reverse map the scores to the input variables.
and then do a comparison of the original input variables vs. the newly calculated variables from the reverse mapping.

I am trying to use the errors to do online fault diagnosis.

How can I do the reverse mapping of the scores in Rapidminer?

Thanks

MrFury

wessel · November 2015

Hi,

I remember doing this in the past, by feeding the PCA a row with all 1's.
Then using the result in a 'rename'.

Is your goal to get the names correct?
Or to apply a PCA model on unseen data?

Best regards,

Wessel

MrFury · November 2015

Feeding it 1's.... I will have to think about that one.

The goal is to do fault diagnosis on some sensor data I am getting from a piece of equipment. So apply the PCA model on unseen time series data.

So I am trying to do similar as described in this article: [http://www.wseas.us/e-library/conferences/2010/Merida/CIMMACS/CIMMACS-20.pdf]

Step 1: Capture most of the natural process variance with PCA on training data. The residual being the "noise"
Step 2: On a new set of data, apply the PCA model to the data to calculate the scores
Step 3: Monitor the T2 or Q statistic to indicate a fault (since the error will increase if seeing something different to what it was modeled on)
Step 4: By backwardly mapping the scores into the input variables one can calculate the error for each variable (E = x - x')

Thanks

Pieter

wessel · November 2015

In this case you want to study the results of PC by hand.
Then hand 'craft' features.

PC1 is typically the average value of the series.
PC2 is typically the difference between the first part of the series and the second part of the series.

By hand crafting you can create soft margins.
So you can specify that the middle part of the series gets a value close to zero.
Where the second last part gets a value quickly building up to -1.
And the first part starting from 1 and quickly dropping to near 0 when the middle part is reached.

As a side note, you really want to start supervised learning if possible.
Using unsupervised PCA will always give you vague answers.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Reverse Mapping of PCA

Answers