The various binning operators ("discretize by something") include different options to name bins automatically. If you select either "long" or "short" names, these names include a leading string component called "range#" where the # is replaced by the number of the bin (starting at the bottom). However, it does not include leading zeros, which means that if you have 10 bins, you will get (range1, range2, range3, ... range10) as the values.
You can probably see where this is going. All subsequent attempts to put these bin values in the correct order will be frustrated by the fact that when sorting by characters, you will get the following sequence: range1, range10, range2, range3, ...
This causes all data, graphs, and any other output where you are trying to look at the variables in order based on the underlying values, to be incorrect, unless you go through the extra steps of renaming the binned variables. Wouldn't it be great if RapidMiner automatically included the required leading zeros based on the number of bins generated? So if there are more than 10 bins but less than 100, then you would get one leading digit (so the names would be range01, range02, range03, etc.). Then all subsequent sorting on these values would allow them to appear in the correct order.
Note that even if you hard-coded all the binning operators to generate just one leading zero for the first 9 bins (so names are always of the form range01, range02, range03, etc.) , you would cover far more use cases than the current default, and there is no harm at all even if you have fewer than 10 bins, since they will still order correctly. I expect there are rarely cases where you need more than 99 bins. But there are many cases where you want more than 9 bins, and the current naming defaults just don't work well whenever the number of bins is 10 or more.

... View more