Options

Unexpected results from Automatic Feature Engineering

pblack476pblack476 Member Posts: 83 Maven
edited November 2019 in Help
So I am trying to squeeze out the most accurate regression possible on my model, and for that I have narrowed GLM, GBT and SVM as the best learners for my data. I first try to optimize GLM as it trains the fastest.

I then generated a bunch of features with loops (manually) and selected the best broad group (this was still 400+ features we are talking about) for GLM. This group was not optimal for SVM or GBT but I wasn't optimizing that yet.

I then proceeded to run AFE on that Set to get the best GLM performance possible. It was no surprise that I got 8 or 9 optimal features that gave me the same GLM performance I had with 400+. So I was happy about that and applied that FeatureSet to my data so I would cut out the long AFE process.

However, this new dataset has considerably better performances in most learners. Including SVM and GBT. Even thou it was GLM optimized.

I then proceed to try and repeat the process for SVM, thinking that if I got such an improvement from a GLM oriented FeatureSet, I would get a better one from running AFE on SVM. But no. The SVM AFE returned a SIMPLER FeatureSet (even when I selected for Accuracy) with decent performance, but it did not beat the GLM AFE FeatureSet.

I did not think that was possible under most circumstances, but yet it happened.

Best Answer

Answers

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited November 2019
    I did not think that was possible under most circumstances, but yet it happened.
    This performance is after getting features from AFE and then applying to SVM or other models with optimal parameter selection right? 

    Out of curiosity, is the difference in performance huge? I saw a few instances in research where GLM performed comparably to SVM's but not a huge difference in GLM totally outperforming SVM.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    pblack476pblack476 Member Posts: 83 Maven
    @varunm1To clarify: what happened was: I trained a GLM and got back a FeatureSet from AFE (that was supposed to be the best for GLM). I used that FeatureSet fo predict with SVM and got an improvement over training SVM with AFE.

    So the GLM featureset was not only best for GLM but also for SVM. It alsso applies to GBT and DT. Both got consistently better from this FS but I have not yet tested with their own respective optimal Feature Sets.


    The difference in my case was very substantial. Trying to predict stock prices I went from 2.03% relative error with the SVM AFE FeatureSet on SVM to 1.6% with the GLM set. At the same time performances went from 2.5% to 2.1% on GLM. And this happens across multiple labels on the set as well. In my specific case, 0.4% error is very meaningful because this is supposed to be used for trading strategies later on.

    GBT and DT also improved with those sets by similar amounts. But SVM seems to reap the most rewards from this.

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
     the SVM is MUCH slower than the GLM learner which means that in the same amount of time there will be much more feature sets tried in the GLM case than in the SVM case.
    Woah, is this the same case when we dont select "Use time limit" option in AFE? I thought its checking all.


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Nope, BUT in this case you still may stop earlier if there is no progress in the optimization for some number of generations.  Given that a (non-linear) SVM is already more powerful than a linear regression, this is more likely to happen for the SVM which effectively is the same outcome then.
  • Options
    pblack476pblack476 Member Posts: 83 Maven
    edited November 2019
    @IngoRM Indeed! I ran the SVM AFE without a time limit and got an equal score (1.6%) as the GLM set.

    One thing to note however that I have observed is that even with time limit turned off, some pre-selection of the subset on which you run the AFE on makes a difference.

    I had a "pruned" featureset that I used as a base for GLM before AFE. That set gave me my base score. However, when I used the full set, one that CONTAINED the entire "pruned" set within it + some other attributes, the AFE results were worse (2.1% vs. 2.3% Rel. Error). Even without a time limit it seems that the addition of noise can impact the results.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Yes, it can.  But for small differences also the fact that the optimization algorithm uses randomized heuristics which are likely (but not guaranteed) to find an optimal solution may contribute to this.  This is what I meant above with "There could also be just smaller random effects..." in my earlier answer.
    Cheers,
    Ingo
Sign In or Register to comment.