better default naming of bins using "range##"

Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
edited December 2018 in Product Feedback - Resolved

The various binning operators ("discretize by something") include different options to name bins automatically.  If you select either "long" or "short" names, these names include a leading string component called "range#" where the # is replaced by the number of the bin (starting at the bottom).  However, it does not include leading zeros, which means that if you have 10 bins, you will get (range1, range2, range3, ... range10) as the values.

 

You can probably see where this is going.  All subsequent attempts to put these bin values in the correct order will be frustrated by the fact that when sorting by characters, you will get the following sequence: range1, range10, range2, range3, ...

 

This causes all data, graphs, and any other output where you are trying to look at the variables in order based on the underlying values, to be incorrect, unless you go through the extra steps of renaming the binned variables.  Wouldn't it be great if RapidMiner automatically included the required leading zeros based on the number of bins generated?  So if there are more than 10 bins but less than 100, then you would get one leading digit (so the names would be range01, range02, range03, etc.).    Then all subsequent sorting on these values would allow them to appear in the correct order.

 

Note that even if you hard-coded all the binning operators to generate just one leading zero  for the first 9 bins (so names are always of the form range01, range02, range03, etc.) , you would cover far more use cases than the current default, and there is no harm at all even if you have fewer than 10 bins,  since they will still order correctly.  I expect there are rarely cases where you need more than 99 bins.  But there are many cases where you want more than 9 bins, and the current naming defaults just don't work well whenever the number of bins is 10 or more.

Brian T.
Lindon Ventures 
Data Science Consulting from Certified RapidMiner Experts
2
2 votes

Declined · Last Updated

Closing this idea - only two votes since Aug 2016. Please re-open if this is of interest.PROD-829

Sign In or Register to comment.