An idea about the feature engineering strategy inside AutoModel

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
edited November 2019 in Help
Dear all,


Firsly, please consider this thread as an idea, a debate and not a "feature request".

Thinking about a recent thread involving feature selection inside  AutoModel, an idea crossed my mind...As RM ambassador, I will now share this idea : 

The following strategy  applies when the user considers several models inside AutoModel (at least 2 models).

The idea is that AutoModel has a "work memory", that it remembers of the features set combinaisons tested with a given model_1 in order it does not test these same features sets with the Model_2 (and more generally with the N other models).
The goal of this idea is to maximize the number of candidate features sets tested and thus to maximize the probability of finding a "winning" features set combinaison during the maximum process duration allowed by the user.
On the other side, the cons of this strategy is that if the feature selection algorithm is finding a "winning" feature set with the model_1, it will not apply this "winning" feature set to the model_2 and thus it will not improve potentially the overall accuracy (all models combined).

To conclude, please let me know your opinion, about this strategy and be honest, if you think it is not relevant and/or counter productive, please let me know..... i will not be angry... ;)

Regards,

Lionel

PS : Please consider this thread, if you think that this strategy is technically feasible.
PS2 : Please, don't consider this thread, if this strategy is already implemented in AutoModel.


Comments

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @lionelderkrikor

    Interesting idea. I have a question.


    The idea is that AutoModel has a "work memory", that it remembers of the features set combinaisons tested with a given model_1 in order it does not test these same features sets with the Model_2 (and more generally with the N other models).

    So for example, if model 1 (logistic regression) tried 10 combinations and found that 1 combination has an RMSE of 0.01, then the next model 2 decision tree tried combinations except the one with an RMSE of 0.01 (found by LR) and gets another best combination with 0.005 RMSE. Now, do you compare the RMSE of the winning combination in DT (model 2) and LR (model 1) to see which has the least and use the one that has less error in model 2 building?

    Sorry if my understanding is not correct 

    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Thank you for participating in the debate, Varun !!

    Now, do you compare the RMSE of the winning combination in DT (model 2) and LR (model 1) to see which has the least and use the one that has less error in model 2 building?
    Sorry if my understanding is not correct
    On the contrary, your idea  improves my basic strategy !! :  Basically, my idea was simply to not test a feature set combinaison on the Model_i that have been previously tested in a model_j.... (where model_i and model_j are 2 different models)
    With your strategy, we can consider that the model_2 "benefits" of the "experience return" of the feature selection performed on the model_1... and more generally we can imagine that the model_i "benefits" of the results of the feature selection(s) performed on the (i-1) previous model(s)...

    Thanks for your idea ! 

    Regards,

    Lionel


  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    Interesting idea, but there is one fact that we need to take into consideration: Learners are different in being able to deduct models from attribute value interactions. Re-testing a combination with another learner would give different results. 

    So we might find a combination that doesn't improve the current learner, and therefore would not be executed with other learners. We would lose a possibly better combination in this case.

    Regards,

    Balázs
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Thanks for sharing your point of view, Balázs ! 

    Yes, I agree with you with this possible disadvantage of this method.

    I think it will be difficult to answer to the question "Will this method, "in average", in fine, improve (or degrade) the accuracy of the models ?", without performing a tests campain including many datasets...

    Regards,

    Lionel
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Thanks, I got your plan. Now critical points to be considered. 

    1. The importance or first model performance: if the initial model running in the automodel doesn't produce good results then removing the feature sets used by the first model might not be a good idea. So there should be some check here.

    2. Model dependency: feature interaction effects are also model dependent as their base stat principles vary. This is one more point that we need to check before removing feature subsets. There is some chance that this removed features might be good for an other model. Similar to Balazs recommendation. As you mentioned this needs lots of tests on variety of datasets.

    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.