Thoughts about memory consumption and FeatureSelection...

Axel · October 2009

Hi everybody,

I'm running the 32 bit Version of RapidMiner 4.6 and try to do a forward feature selection on a data set with 100 examples and 2000 features. After 5 hours RapidMiner used 1.4 GB RAM and finished with an Out of Memory error :-(

Searching the forum I found several posts dealing with memory consumption and that it might be a bad idea to do feature selection on such a large data set. Then I tried to do a rough calculation of the necessary memory:
100 examples * 2000 features * 8 byte = 1.6 MB
For the first generation the FeatureSelection algorithm will create 2000 individuals making this 3.2 GB, so no wonder that I run out of memory.

But then I realized that this is true for a backward feature selection, but not for a forward feature selection !
Forward selection starts with a single attribute, so all the individuals of the first generation only need
100 examples * 1 feature * 8 byte * 2000 individuals = 1.6 MB !!

So, now I'm back to square one. Why is forward feature selection needing so much memory ??
My only guess is that, although not necessary, the individuals do nevertheless get a full copy of the data set !?
If this is true, the code urgently needs a revision.

Maybe someone can comment on this ?

Many thanks,
Axel

land · October 2009

Hi Axel,
unfortunately this part of RapidMiner is quite old and although following the nice generalization idea of mapping everything to population based operations, it has the disadvantage of being quite inefficient.
Although not made public yet, we are providing an extension giving you efficient implementations of forward and backward selection. We are going to add a few more valuable operators before publishing, but if you are interested, we probably could give you a pre-version...

Greetings,
Sebastian

Axel · October 2009

Hi Sebastian,

your new implementation of feature selection sound very interesting.
Of course I would like to try it, if possible.
How would I get it ?

Axel

P.S. Sorry for the delay. I was on a short holiday :-)

land · October 2009

Hi,
no problem about that. I hope, you had a good time, while we were working

For further informations about the plugin, could you please write an email to contact@rapid-i.com?

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Thoughts about memory consumption and FeatureSelection...

Answers