Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Random forest basic question

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help

hi,

I a paper about general algorithm of random forest, it is said on each node, a subset of variables is chosen to test.

my Question: When its constructing the tree with the training set, will there already be made the choice of random variables to built the tree? or will it be trained with all variables, and afterwards, for testing only a small subset of variables will be chosen to test the tree on OOB data?

 

and finally, do the subsets  always have to be nonoverlapping, means distinct from eachother? or is it entirely randomly how to choose variables? so that there can also be repeated variables  (more than one of the same) to test each node?

Best Answer

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Hi,

     

    The random subsets are chosen during the training phase already.  The point is to "force" the different trees to cover different aspects of the data space and problem to learn.  They become somewhat weaker through that which is encounted by building the ensemble of trees.

     

    Typically (also in case of RapidMiner) the subsets are completely random, i.e. overlapping can occur.

     

    Best,

    Ingo

Answers

Sign In or Register to comment.