"Different performance from Backward Elimination when not using the operator"

aphongmeaphongme Member Posts: 3 Contributor I
edited June 9 in Help

I used the Backward Elimination operator to optimize my AUC for logistic regression by eliminating some attributes. However, when I stop using the Backward Elimination operator and eliminate the same attributes myself using the Selected Attribute operator (based on Backward Elimination operator's results) the resultant AUC/Performance is not the same (it lower). This is the same for many optimization operators (Optimize Parameter (Grid), Forward Selection).

How do these optimization operators work and how are they different from doing it manually (without optimization operator) ?

My data has 2030 instances with 33 features and 1 binary dependent variable.

Tagged:

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 748   Unicorn

    Hi @aphongme,

     

    I'm not specialist of feature selection algorithms, so I don't know why manually, you don't obtain the same AUC as using feature selection algorithms.

    However to have an element of answer about how these algorithms works, you can find a ressource (especially part 1 / part 2) by following this link : 

    https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Multi-Objective-Feature-Selection-Part-1-The-Basics/ta-p/45775/jump-to/first-unread-message

     

    I hope it helps.

     

    Regards,

     

    Lionel

    sgenzer
  • aphongmeaphongme Member Posts: 3 Contributor I

    This also happen when I use Optimize (Grid) operator too. The parameters that I got, when I try running them without using the operator the AUC decrease significantly.

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 748   Unicorn

    Hi again @aphongme,

     

    Can you verify your XML process and share it (the process you shared in the other topic is broken).

    An pist of investigation can be first to build the ROC curves in the 2 cases (case 1 : manually / case 2 : use of feature selection - Optimize parameters algorithms) and compare these curves (using for example Compare ROCs operator).

     

    Regards,

     

     

    Lionel

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,226   Unicorn

    Are you making sure to use a specific random seed to ensure reporducible results?

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    lionelderkrikor
Sign In or Register to comment.