Short question on FrequencyDiscretization

calvinus · November 2008

Hi there,

I have a quick question. Take for example the following output of FrequencyDiscretization:

q58_B -Infinity <= range3 [4.500 - 0] <= 0.0 <= range1 [-8 - 2.500] <= 2.5 <= range2 [2.500 - 4.500] <= 4.5 <= range5 [0 - 8] <= Infinity

Despite that the ranges are not sorted (which is a bit confusing), range3 is odd to me. Why does it go from 4,5 to 0? And why is it in front and not in line?
And where is range4? Why does range5 start again at 0? So the ranges are overlapping?
Values in field q58_B only go from 1-5 and some missing values.
Could you please give me some hints on how to use this output?

Thanks in advance,
best regards
Jörg

TobiasMalbrecht · November 2008

Hi Jörg,

as far as I can see, the operator seems to contain a bug, we will have to check that. Maybe next week one of our developers has the time to look into that problem. Thanks for pointing out the problem.

Regards,
Tobias

land · December 2008

Hi Jörg,
this is not realy a bug. Its caused because your data contains too many same values. If there are too many same values, the containing bin grows over its targeted size, because they can't be distinguished. If there are more than twice the bin size of same values, the bin steals the example from the following bin(s). If this happens, they don't have any example determining their limits.
The developer version now throws an error if that happens, because its probably not the intended behavior.
There are be two possibilities: Reduce the number of bins or add some noise, which would make the values distinguishable.

Please keep in mind, that missing values are not treated at all.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Short question on FrequencyDiscretization

Answers