Thoughts about memory consumption and FeatureSelection...
I'm running the 32 bit Version of RapidMiner 4.6 and try to do a forward feature selection on a data set with 100 examples and 2000 features. After 5 hours RapidMiner used 1.4 GB RAM and finished with an Out of Memory error :-(
Searching the forum I found several posts dealing with memory consumption and that it might be a bad idea to do feature selection on such a large data set. Then I tried to do a rough calculation of the necessary memory:
100 examples * 2000 features * 8 byte = 1.6 MB
For the first generation the FeatureSelection algorithm will create 2000 individuals making this 3.2 GB, so no wonder that I run out of memory.
But then I realized that this is true for a backward feature selection, but not for a forward feature selection !
Forward selection starts with a single attribute, so all the individuals of the first generation only need
100 examples * 1 feature * 8 byte * 2000 individuals = 1.6 MB !!
So, now I'm back to square one. Why is forward feature selection needing so much memory ??
My only guess is that, although not necessary, the individuals do nevertheless get a full copy of the data set !?
If this is true, the code urgently needs a revision.
Maybe someone can comment on this ?