"PCA vs PrincipalComponentGenerator?"

Legacy User · August 2008

Hi,

From what I could see, experiments ExampleSource-PrincipalComponentsGenerator(1) and ExampleSource-PCA-ModelApplier(2) generate the same output data sets in the input set contains a label attribute. If the input does not have a label, experiment (1) crashes at runtime, even though it passes validation. In addition, the experiment (2) outputs the PCA model, and has more controls (number of PCs).

If the PCA operator is clearly superior to the PrincipalComponentsGenerator, why do you keep the PrincipalComponentsGenerator? Or does it have any advantages I missed?

Victor

IngoRM · August 2008

Hi,

you are right. They deliver the same output. There are basically two reasons for keeping the PrincipalComponentsGenerator:

1. backwards compatibility
2. only one operator instead of two in cases where you are interested in the PCA only (without the model)

It is, however, very likely that this operator will be marked as deprecated and will be removed from a future release sometime.

Cheers,
Ingo

Stefan_E · January 2009

Hi,

... there seems to be another reason: Performance!

I have a data set with 20 attributes, 5094 examples.
1. PrincipalComponentsGenerator returns in a matter of a couple of seconds.
2. PCA takes 2900s so far and is still running with 100% CPU load

When I put a sampling operator in front of PCA and sample for 70%, I get a result in ~10s - still slower than PrincipalComponentsGenerator, but at least tolerable.

The dataset is such the PC-1 explains 99.97% of the variance - don't know whether that has any impact.

Kind regards Stefan

Stefan_E · January 2009

hmm.... my dataset contained a line with missing values.
Not very elegant of PCA of course to just go to nirwana with such an input, but if I delete that line, it works.

Kind regards Stefan

land · January 2009

Hi,
we will increase the elegance of PCA by throwing an error with the next version.

Thanks for the hint,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"PCA vs PrincipalComponentGenerator?"

Answers