# Binning by entropy -- inner logic

kypexin
Moderator, RapidMiner Certified Analyst, Member Posts:

**291**Unicorn
Hi miners,

I need to understand inner logic of 'Binning by Entropy' operator (however I understand the standalone algorithm itself). It seems to me that in many cases it tries to minimize the final number of bins, which results in maximum 2 bins for most variables in certain datasets. This often might me relevant, however, very often not granular enough.

Think of customer age in credit risk applications. Traditionally, the correlation is such, that the younger the customer, the riskier he is, and with a little upward trend in the oldest age group also. Technically, we can say that 2 bins can be a minimum that works here, but such binning does not take into account the distribution of risk per more granular age groups. If using weight of evidence binning, in many cases we may see distributions like this (here blue trend goes perfectly down throughout age groups, so it easily could be represented by 2 bins minimum):

Do I understand it right that this is how actually the operator works, trying to minimise number of bins? Can there be in the future possibilities and improvements for more control over parameters, like specifying desired minimum number of bins, and so on?

Also, a side question: anyone ever heard of an implementation of weight of evidence / information value algorithms and binning for RM?

Many thanks.

I need to understand inner logic of 'Binning by Entropy' operator (however I understand the standalone algorithm itself). It seems to me that in many cases it tries to minimize the final number of bins, which results in maximum 2 bins for most variables in certain datasets. This often might me relevant, however, very often not granular enough.

Think of customer age in credit risk applications. Traditionally, the correlation is such, that the younger the customer, the riskier he is, and with a little upward trend in the oldest age group also. Technically, we can say that 2 bins can be a minimum that works here, but such binning does not take into account the distribution of risk per more granular age groups. If using weight of evidence binning, in many cases we may see distributions like this (here blue trend goes perfectly down throughout age groups, so it easily could be represented by 2 bins minimum):

Do I understand it right that this is how actually the operator works, trying to minimise number of bins? Can there be in the future possibilities and improvements for more control over parameters, like specifying desired minimum number of bins, and so on?

Also, a side question: anyone ever heard of an implementation of weight of evidence / information value algorithms and binning for RM?

Many thanks.

1

## Answers

1,635UnicornBy the way, I completely second the idea of getting an operator to calculate WoE or IV and return that explicitly! That would be quite helpful. There is an operator that I typically use as a proxy because it has high correlation, although it doesn't output the information value directly, you can use the Weight by Information Gain operator to find the relative magnitudes with pretty good reliability, I think.

Lindon Ventures

Data Science Consulting from Certified RapidMiner Experts