Options

"Different performance from Backward Elimination when not using the operator"

aphongmeaphongme Member Posts: 3 Contributor I
edited June 2019 in Help

I used the Backward Elimination operator to optimize my AUC for logistic regression by eliminating some attributes. However, when I stop using the Backward Elimination operator and eliminate the same attributes myself using the Selected Attribute operator (based on Backward Elimination operator's results) the resultant AUC/Performance is not the same (it lower). This is the same for many optimization operators (Optimize Parameter (Grid), Forward Selection).

How do these optimization operators work and how are they different from doing it manually (without optimization operator) ?

My data has 2030 instances with 33 features and 1 binary dependent variable.

Tagged:

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @aphongme,

     

    I'm not specialist of feature selection algorithms, so I don't know why manually, you don't obtain the same AUC as using feature selection algorithms.

    However to have an element of answer about how these algorithms works, you can find a ressource (especially part 1 / part 2) by following this link : 

    https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Multi-Objective-Feature-Selection-Part-1-The-Basics/ta-p/45775/jump-to/first-unread-message

     

    I hope it helps.

     

    Regards,

     

    Lionel

  • Options
    aphongmeaphongme Member Posts: 3 Contributor I

    This also happen when I use Optimize (Grid) operator too. The parameters that I got, when I try running them without using the operator the AUC decrease significantly.

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @aphongme,

     

    Can you verify your XML process and share it (the process you shared in the other topic is broken).

    An pist of investigation can be first to build the ROC curves in the 2 cases (case 1 : manually / case 2 : use of feature selection - Optimize parameters algorithms) and compare these curves (using for example Compare ROCs operator).

     

    Regards,

     

     

    Lionel

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Are you making sure to use a specific random seed to ensure reporducible results?

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.