Binning by entropy -- inner logic

kypexin · December 2019

Hi miners,

I need to understand inner logic of 'Binning by Entropy' operator (however I understand the standalone algorithm itself). It seems to me that in many cases it tries to minimize the final number of bins, which results in maximum 2 bins for most variables in certain datasets. This often might me relevant, however, very often not granular enough.

Think of customer age in credit risk applications. Traditionally, the correlation is such, that the younger the customer, the riskier he is, and with a little upward trend in the oldest age group also. Technically, we can say that 2 bins can be a minimum that works here, but such binning does not take into account the distribution of risk per more granular age groups. If using weight of evidence binning, in many cases we may see distributions like this (here blue trend goes perfectly down throughout age groups, so it easily could be represented by 2 bins minimum):

Image: https://us.v-cdn.net/6030995/uploads/editor/mw/fst4pcowh1ie.png

Do I understand it right that this is how actually the operator works, trying to minimise number of bins? Can there be in the future possibilities and improvements for more control over parameters, like specifying desired minimum number of bins, and so on?

Also, a side question: anyone ever heard of an implementation of weight of evidence / information value algorithms and binning for RM?

Many thanks.

Telcontar120 · December 2019

While I haven't inspected the operator code directly, binning by entropy would typically use an underlying algorithm that adds a penalty for each additional bin to prevent over-specification. So it is not directly minimizing the number of bins but rather avoiding an excessive number of bins if the additional gain in entropy is not worth it. The help topic for this operator is unfortunately not more explicit about the function used although there are references to two academic papers used that might have more detail.

By the way, I completely second the idea of getting an operator to calculate WoE or IV and return that explicitly! That would be quite helpful. There is an operator that I typically use as a proxy because it has high correlation, although it doesn't output the information value directly, you can use the Weight by Information Gain operator to find the relative magnitudes with pretty good reliability, I think.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Binning by entropy -- inner logic

Answers