RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Need recommendation for prediction/regression workflow
Greetings one and all
I am getting acquainted to machine learning in Rapidminer and essentially I'm concerned with a prediction problem. My CSV with about 5000 Examples contains 10 predictor attributes and 2 target attributes. I have a few queries with regards to Rapidminer's design:
1) I understand that only 1 target attribute (or prediction) can be set. Would I be able to predict both using a single process?
2) I am interested in using the new Deep Learning operator in performing the training. What are the recommended preprocessing steps? I can think of 1) filtering (missing values); 2) Normalizing. Do correlated attributes need to be removed manually?
3) For splitting of data into training, testing and validation, am I supposed to simply use Cross Validation with the Deep Learning operator nested within it? What about Split Validation? Does these operators split the original data into the 3 sets?
4) Can the deep learning operator handle a mixture of categorical and numerical? Is there no one-hot encoding necessary within Rapidminer, or do I need to preprocess using Nominal to Numerical (dummy coding)? For categorical variables, is the polynominal role suitable to describe it? I noticed there is also a 'text' class.
5) What does the 'reproducible' function do within the DL operator?
6) Is it possible to 'deploy' a trained DL model to an operational scenario?
7) When importing data using the Import Config. Wizard, could I skip defining the roles and instead use the Set Roles function in the designer?
My apologies for the many questions. I find Rapidminer to be a powerful tool and really user friendly. Would like to take the time to really understand it. Thank you very much.