reuse data cleanup steps

nuno2055 · July 2017

Hello,

I created an operator chain to cleanup training data and now would like to apply the exact same chain to test data.

How can I do this in the same process without copying the entire chain to feed the test set to ?

Thank you

Nuno

Telcontar120 · July 2017

There are several ways to handle this situation, but perhaps the easiest thing to do would be to save your first process as "data ETL" or something similar.

Then create a separate process for doing data ETL on your test data, and from that process you simply load the test data (however that is done, via files or db connection) and then call the original ETL process from your repository using the "Execute Process" operator. As long as the test data starts in the same raw format as your original data, this will work fine. And you can also use that same ETL process in the future to transform unlabeled data.

Under this approach, you will only have to maintain the one version of your ETL process, so if you add to it or update it in the future, you don't need to worry about replicating those changes elsewhere. The "Execute Process" operator will always retrieve the most current version of that process to apply.

nuno2055 · July 2017

Brilliant suggestion!!

Thank you very much!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

reuse data cleanup steps

Best Answer

Answers