Options

Is it correct to use features selection before Gradient Boosted Trees?

f_lapernaf_laperna Member Posts: 13 Contributor II
edited December 2018 in Help

Hi everyone!
My question is the following:

I'm trying to build some different classification models with different algorithms and techniques and compare the results obtained.

I already built a model using Random Forest and using Bagging technique. In this case, since I had many attributes in my dataset and most of them were almost useless wrt to my target variable classification, I performed a very simple Features Selection by attributes weights. I read in literature that with Bagging is better to perform features selection at each bootstrap.

 

But when using an algorithm as Gradient Boosted Trees which uses Boosting technique to select features subsets which minimize misclassification error, does it make sense to perform a FS before training the model?

I read that some boosted algorithms already contain feature selection and some do not.

 

Hope someone with more knowledge and experience can help me, thank you in advance!

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    Hi,

     

    it's statistically sound to do FS for ANY machine learning algorithm. No matter if it's boosted, bagged or plain. If this yields to better accurcacy - go for it.

     

    This said, both RF and GBTs to some FS internally. So FS is not as important for them as it is for others. I would nevertheless still do it.

     

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @f_laperna

     

    I could add from my experience that in some cases, especially if you do some extensive optimization of model parameters, reduction of features might also reduce training time both for RF and GBT and speed up the whole process, so if you have really MANY features and are sure that you can safely omit part of them, why not then.  

     

    Though, in case of BGT, I'd also suggest that you try different feature weighting algorithms, and also consider feature weights that are returned by GBT algorithm itself.  

Sign In or Register to comment.