Hello Rapidminer community,
I posted this question yesterday evening as well, however it has somehow disappeared after I edited it. I'm not sure if it will come back, so I thought I will ask again.
I have the following situation: I have a labelled dataset with 80+ features and ~3 million rows. I want to do a feature selection to get the ~10 most relevant features. The resulting features have to be discretized as I can only have a limited amount of different possibilities. For example, if a feature has values between 0-100 I will have to discretize it into 2-5 bins. Now I am unsure if I have to discretize all 80 variables first and then do the feature selection or if I can do the discretization only on the 10 most relevant features. How would this effect my result? I greatly appreciate your answers and explanations!