🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤
We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.
Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!
Split data set and low quality operator
I'm new to Rapidminer so be a bit passionate to me
1. My issue is that I would like to split out my full data set into a training set(use this also for validation) and test set. From my understanding it's best practice to spilt out the test set straight away (and the before I do for instance any exploratory data analysis and feature selection analysis on the data).
So If I split my full data set with "Split data" operator and use the "Remove correlated attributes" (referred to as "corr.") operator on the training set and the corr. operator remove some attributes. At the end I store this final training set. Now my test set has more attributes than my tranining set - how do I remove the same attributes generated by the corr. operator to my test set? I don't want to use the corr. operator on my test set because it could potensially remove fewer or other attributes. Is it possible to generation the test set in this automatic / dynamic way? Are there any other ways you guys do this process?
2. Do there exists any "low quality" operator (i.e. the same low quality operations carried out inside the Turbo prep tab -> Cleanse -> Remove low quality) in Rapidminer Design?
Love to hear from you.