reuse data cleanup steps

nuno2055nuno2055 Member Posts: 2 Contributor I
edited December 2018 in Help

Hello,

 

I created an operator chain to cleanup training data and now would like to apply the exact same chain to test data.

How can I do this in the same process without copying the entire chain to feed the test set to ?

 

Thank you

Nuno

Tagged:

Best Answer

  • Telcontar120Telcontar120 Posts: 1,235   Unicorn
    Solution Accepted

    There are several ways to handle this situation, but perhaps the easiest thing to do would be to save your first process as "data ETL" or something similar.

    Then create a separate process for doing data ETL on your test data, and from that process you simply load the test data (however that is done, via files or db connection) and then call the original ETL process from your repository using the "Execute Process" operator.  As long as the test data starts in the same raw format as your original data, this will work fine.  And you can also use that same ETL process in the future to transform unlabeled data.

    Under this approach, you will only have to maintain the one version of your ETL process, so if you add to it or update it in the future, you don't need to worry about replicating those changes elsewhere.  The "Execute Process" operator will always retrieve the most current version of that process to apply.

     

     



    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • nuno2055nuno2055 Member Posts: 2 Contributor I

    Brilliant suggestion!!

    Thank you very much!

Sign In or Register to comment.