RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Split data set and low quality operator
I'm new to Rapidminer so be a bit passionate to me
1. My issue is that I would like to split out my full data set into a training set(use this also for validation) and test set. From my understanding it's best practice to spilt out the test set straight away (and the before I do for instance any exploratory data analysis and feature selection analysis on the data).
So If I split my full data set with "Split data" operator and use the "Remove correlated attributes" (referred to as "corr.") operator on the training set and the corr. operator remove some attributes. At the end I store this final training set. Now my test set has more attributes than my tranining set - how do I remove the same attributes generated by the corr. operator to my test set? I don't want to use the corr. operator on my test set because it could potensially remove fewer or other attributes. Is it possible to generation the test set in this automatic / dynamic way? Are there any other ways you guys do this process?
2. Do there exists any "low quality" operator (i.e. the same low quality operations carried out inside the Turbo prep tab -> Cleanse -> Remove low quality) in Rapidminer Design?
Love to hear from you.