Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Implementation of Random Forest (versus Decision Forest?)
Hi all,
I'm wondering how the RapidMiner RandomForest classifier is implemented. It seems to me that there are significant differences to the version of Breiman (BREIMAN, L: Random Forests Machine Learning, 45, 5–32, 2001).
Main features of Random Forests are:
- each tree grows on his individual bootstrap sample set
- at each node of the trees, a defined number of features is randomly selected and evaluated for the best split
Is the RapidMiner RandomForest classifier working like that? Are individual trees grown on bootstrap samples? And I suppose the number of features is rather determined for the hole tree, not for each node (?). If so it would rather resemble the "Decision Forest" of Ho ( Ho, T.K. 1998: The Random Subspace Method for Constructing Decision Forests. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 8, AUGUST 1998).
The WEKA-version of the Random Forest classifier seems to follow Breiman's concept I guess, so this could be the choice anyway, however: the "Weight by Tree Importance"-operator which I would like to use does not work with the WEKA-version.
Thanks in advance.
Ollestrat
I'm wondering how the RapidMiner RandomForest classifier is implemented. It seems to me that there are significant differences to the version of Breiman (BREIMAN, L: Random Forests Machine Learning, 45, 5–32, 2001).
Main features of Random Forests are:
- each tree grows on his individual bootstrap sample set
- at each node of the trees, a defined number of features is randomly selected and evaluated for the best split
Is the RapidMiner RandomForest classifier working like that? Are individual trees grown on bootstrap samples? And I suppose the number of features is rather determined for the hole tree, not for each node (?). If so it would rather resemble the "Decision Forest" of Ho ( Ho, T.K. 1998: The Random Subspace Method for Constructing Decision Forests. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 8, AUGUST 1998).
The WEKA-version of the Random Forest classifier seems to follow Breiman's concept I guess, so this could be the choice anyway, however: the "Weight by Tree Importance"-operator which I would like to use does not work with the WEKA-version.
Thanks in advance.
Ollestrat
0
Answers
Would be great to get some help as I am completely lost on the code level.
I have forwarded your question to one of our developers, maybe they can tell us more.
However, at least about the bootstrapping I can say something: No, this would at least only happen with a very small probability for decent data sets. Bootstrapping basically only means sampling with replacement where the sample size most often is the size of the original data set. If you use a sample ratio of 1 for a data set consisting of n examples, you will end up with n examples but several of them might be used more than once. Actually about 63% of the original set will be used, the rest is not part of the sample (but probably will be for another tree).
I am not sure about where random attribute sets are used (I think it's per node but it also might be per tree). Maybe one of our developers can look it up (I would do this myself but I am currently not in my office...)
Cheers,
Ingo