MartinLiebigAdministrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts: 3,453 RM Data Scientist
Hi, well.. The short answer here is: no.
One of the coolest thing for Random Forests is, that they can basically handle all kind of data and also have some "built-in feature selection". Thus you can just throw data on it and get reasonable results. The only exception for this are date attributes, which you should preprocess (e.g. Day of the Week).
Now the longer answer: The right preprocessing can get you better results. While Random Forests are easy-care algorithms, you can still do things. One problem could be feature generation to get around XOR-Problems. The new Auto-Model feature for automatic feature generation, which is part of the 9.1 (Beta) can help here.
BR, Martin
- Sr. Director Data Solutions, Altair RapidMiner - Dortmund, Germany
0
rfuentealbaModerator, RapidMiner Certified Analyst, Member, University ProfessorPosts: 568 Unicorn
I agree with @mschmitz: You don't have to. Nevertheless, I would pass only the features I want to use to my algorithm, and remove the correlated attributes.
Also, if you are not necessarily going to use Random Forest, then other algorithms will benefit from or require preprocessing. So it is a helpful step to perform as part of your EDA, especially, handling missings, outliers, etc.
Brian T. Lindon Ventures Data Science Consulting from Certified RapidMiner Experts
Answers
well.. The short answer here is: no.
One of the coolest thing for Random Forests is, that they can basically handle all kind of data and also have some "built-in feature selection". Thus you can just throw data on it and get reasonable results. The only exception for this are date attributes, which you should preprocess (e.g. Day of the Week).
Now the longer answer: The right preprocessing can get you better results. While Random Forests are easy-care algorithms, you can still do things. One problem could be feature generation to get around XOR-Problems. The new Auto-Model feature for automatic feature generation, which is part of the 9.1 (Beta) can help here.
BR,
Martin
Dortmund, Germany
I agree with @mschmitz: You don't have to. Nevertheless, I would pass only the features I want to use to my algorithm, and remove the correlated attributes.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts