Running out of features during feature selection

kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 290 Unicorn
edited July 2019 in Help
Hi, 

I am stumbling upon the same error again and again while using FEATURE SELECTION operator with GLM learner inside. It starts with 56 features and pretty fast literally runs out of features each time I am trying to run the process. 


These are GLM settings: 


These are feature selection settings: 


Please advise. I can also provide any additional information if needed. 

Thanks! 
Tghadially

Best Answer

Answers

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 290 Unicorn
    Thanks @IngoRM

    I think I shouldn't have constant features as those were removed beforehand while cleaning the data. As for collinearity, I need to re-check this once again; anyway I will also try to uncheck the corresponding option as well. 
    IngoRMTghadially
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 290 Unicorn
    Hi @IngoRM

    I am getting back to this thread as I have faced the problem again. 
    Previously I have disabled removing collinear columns by nested GLM and this helped, so it helped and the process worked OK. 
    This time I have run into it again and found out that there was actually one constant column in my data after filtering the smaller subset for feature selection.
     
    Hence my question, can't feature selection operator just ignore such columns, as it can happen eventually as in my case, but the error message itself seems too confusing actually?

    Thanks! :) 
    Tghadially
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Yeah, the error message is bad indeed. Unfortunately there is nothing we can do about this because we do not "own" that particular part of the code... :-( I am personally a bit torn on the constant handling here though. If we just keep it in, we avoid the error in this particular case but it kind of bugs me that a feature selection, which is supposed to get rid of the weak features, forces to keep constant column in. It kind of defeats the purpose.... also because it is really undocumented / special behavior of the H2O learner here we would need to work around...

    So I actually would prefer to keep it the way it is but that would require you to use a Remove Useless Attributes operator before. Last option would be to remove all constant features automatically BEFORE we start the feature selection (and throw an error if that removes all columns), but that makes this a bit implicit which is not great either...

    Any opinions on this?

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Sorry @kypexin for posting here. @IngoRM Do you think the below post is also because of the same issue? I asked user to have a breakpoint and check, but it shows that there is a feature going inside the model, not sure why its throwing the same H2O error. I tried with different datasets but didnot encounter this error. Just curious why its returning an error when there are featurea going inside GBT

    https://community.rapidminer.com/discussion/55910/forward-selection-error-thrown#latest
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    Tghadially
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Yes, good catch!  This is indeed extremely likely for the same reason.  This error message is only shown if the H2O model removed all features itself (which is super annoying - wish we could turn this behavior just off...).  Typically this happens because of co-linear features (that can actually be turned off, but cannot be the reason for the other thread since there is only one input feature anyway...).  The other reason is a constant input which H2O simply removes as well.  This is what I think is going on here: all values in the window are constant, H2O removes it, and finally it complains that there are not features left (sigh)...
    I will bring this up with our engineers to see if they can talk to the H2O folks to make this work.  But to be honest, I would not hold my breath...
    -
    varunm1Tghadiallykypexin
Sign In or Register to comment.