01-31-2013 02:42 PM
Solved! Go to Solution.
04-23-2017 01:53 PM
@kayman I do not think this is necessarily a bug---based on parameter description, it simply is the minimum value that is accepted as the maximum range for x-means.
|k max (optional)|
|The maximal number of clusters which should be detected.|
Range: 60 - +∞
It looks like the algorthim is still testing every value between 2 (default minimum) and 60 so I am not sure if it matters if you would prefer a smaller maximum, since it would be within your range.
04-23-2017 08:25 PM
I see, didn't notice that to be honnest...
60 seems however like a fairly big number, the problem is that in my specific case the best value is typically below 20, and therefore it takes like 3 times the necessary amount of time to get the best prediction. In itself not such a big deal but when working with larger datasets it does make a nice difference.
Any reason why 60 is chosen, is there a statistical story behind that number (pure out of interest) ?
04-24-2017 04:26 AM
i've read the source code and there is no comment on why 60 is the min.. The class cites the paper: https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf for the implementation. Not sure if there is some argument in this one.
I will open up a ticket internally. If you desperatly need it you can extend the class and change the setting.
04-24-2017 05:20 AM
RapidMiner Studio 7.5 will reduce the selectable minimum to 3.