🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Thoughts about memory consumption and FeatureSelection...

AxelAxel Member Posts: 19  Maven
edited November 2018 in Help
Hi everybody,

I'm running the 32 bit Version of RapidMiner 4.6 and try to do a forward feature selection on a data set with 100 examples and  2000 features.  After 5 hours RapidMiner used 1.4 GB RAM and finished with an Out of Memory error :-(

Searching the forum I found several posts dealing with memory consumption and that it might be a bad idea to do feature selection on such a large data set. Then I tried to do a rough calculation of the necessary memory:
100 examples * 2000 features * 8 byte = 1.6 MB 
For the first generation the FeatureSelection algorithm will create 2000 individuals making this 3.2 GB, so no wonder that I run out of memory.

But then I realized that this is  true for a backward feature selection, but not for a forward feature selection !
Forward selection starts with a single attribute, so all the individuals of the first generation only need
100 examples * 1 feature * 8 byte * 2000 individuals = 1.6 MB !!

So, now I'm back to square one. Why is forward feature selection needing so much memory ??
My only guess is that, although not necessary, the individuals do nevertheless get a full copy of the data set !?
If this is true, the code urgently needs a revision.

Maybe someone can comment on this ?

Many thanks,
  Axel

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,527   Unicorn
    Hi Axel,
    unfortunately this part of RapidMiner is quite old and although following the nice generalization idea of mapping everything to population based operations, it has the disadvantage of being quite inefficient.
    Although not made public yet, we are providing an extension giving you efficient implementations of forward and backward selection. We are  going to add a few more valuable operators before publishing, but if you are interested, we probably could give you a pre-version...

    Greetings,
      Sebastian
  • AxelAxel Member Posts: 19  Maven
    Hi Sebastian,

    your new implementation of feature selection sound very interesting.
    Of course I would like to try it, if possible.
    How would I get it ?

    Axel

    P.S. Sorry for the delay. I was on a short holiday :-)
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,527   Unicorn
    Hi,
    no problem about that. I hope, you had a good time, while we were working :)
    For further informations about the plugin, could you please write an email to [email protected]?

    Greetings,
      Sebastian
Sign In or Register to comment.