Discretization before or after Feature Selection?

green_tea · January 2019

Hello Rapidminer community,

I posted this question yesterday evening as well, however it has somehow disappeared after I edited it. I'm not sure if it will come back, so I thought I will ask again.

I have the following situation: I have a labelled dataset with 80+ features and ~3 million rows. I want to do a feature selection to get the ~10 most relevant features. The resulting features have to be discretized as I can only have a limited amount of different possibilities. For example, if a feature has values between 0-100 I will have to discretize it into 2-5 bins. Now I am unsure if I have to discretize all 80 variables first and then do the feature selection or if I can do the discretization only on the 10 most relevant features. How would this effect my result? I greatly appreciate your answers and explanations!

lionelderkrikor · January 2019

Hi @green_tea,

I would say that Discretize the data must be performed before Feature Selection on the training set :

Image: https://us.v-cdn.net/6030995/uploads/editor/vj/o3pnahkzjx9z.png

Don't forget to apply the same pre-processing step(s) to your test set...

Hope it helps,

Regards,

Lionel

Telcontar120 · January 2019

The issue is that features in general can behave quite differently after discretization than in their raw forms. Discretization both masks information and also transforms the input space. While it is "allowable" to do it either way, I think you would need to be pretty careful if you did feature selection first because what you have selected is not necessarily having the same relationship to your label after you transform it subsquently.

It also matters to this discussion what types of models you are using for both feature selection and your subsequent work. Some modeling algorithms will inherently discretize their continuous inputs (e.g., think tree-based alogorithms) in which case your selection can probably be done afterwards based on what is used in the initial screening, but where you will be better off using the splits that those trees find when doing your discretization. Other approaches create functional relationships (e.g., think linear regression or neural networks) in which case a discretized input could be very different from its raw form.

MartinLiebig · January 2019

Hi @green_tea ,

i would agree with @lionelderkrikor , but with a bit less "force". I think it's statically legal to do both. But i don't see any reason to do a FS on a different feature representation than you use for learning?

BR,

Martin

Maerkli · January 2019

Hallo Green_Tea,

Martin and Lionel are two RapidMiner authorities - I can't contradict them.However, I would recommand to look this training given by Markus Hofmann, another RM senior person:

https://www.youtube.com/watch?v=Nmo5puHRBwE

Maerkli

green_tea · January 2019

Hello @lionelderkrikor, @mschmitz, @Maerkli

first of all, thanks for the very fast replies and explanations!

As @mschmitz asked

But i don't see any reason to do a FS on a different feature representation than you use for learning?

I will actually not use the resulting dataset for learning, but will combine the selected features into an "activity key" that I have to use for another tool. That is also the reason why I have to discretize the features, as too many different possibilities will limit the usability of that key. By doing the discretization afterwards, I would safe a lot of work.

MartinLiebig · January 2019

@Telcontar120 ,

For the Dec-Tree example: If you discretize first, you enforce specific splits (your bin-boundaries). This changes what the tree can do. You further reduce the ability of the tree to split into a tree. This is a quasi-pre-pruning. Thus it makes a big difference for a tree if you do it before or not.

BR,

Martin

Telcontar120 · January 2019

@mschmitz Agreed, that is why I said if using a tree method it would be better to do modeling first and then use the splits found by the tree in discretization. Sorry if that was not clear.

green_tea · January 2019

Thanks for the input!

I decided to discretize first and am doing the feature selection right now. I will probably also do the same evaluation without discretization to see how much of a difference it makes.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Discretization before or after Feature Selection?

Best Answers

Answers