"x-means k max"

MarcosRLMarcosRL Member Posts: 53 Contributor II
edited June 2019 in Help
Hello friends of the community. a query
I'am trying to apply x-means algorithm for clusterizar some data, the issue is that when I select the operator "x-means" has the parameters k and k min = 2 max = 60
The problem is that it does not let me select a k max below 60. this is a restrigcion algorithm or a bug?
regards
Tagged:

Best Answer

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Solution Accepted

    Hi,

     

    RapidMiner Studio 7.5 will reduce the selectable minimum to 3.

     

    Regards,

    Marco

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    At a first glance this looks like a bug. I will create an internal ticket requesting to investigate this issue.

    Best regards,
    Marius
  • kaymankayman Member Posts: 662 Unicorn

    It seems you still the same issue, I'm also not able to select anything but 60

     

    It works if you add it as a parameter directly in the xml (eg <parameter key="k_max" value="20"/>), looks like this is not added by default 

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @kayman I do not think this is necessarily a bug---based on parameter description, it simply is the minimum value that is accepted as the maximum range for x-means.  

    k max (optional)
    The maximal number of clusters which should be detected.
    Type: integer
    Range: 60 - +∞
    Default: 60

     

    It looks like the algorthim is still testing every value between 2 (default minimum) and 60 so I am not sure if it matters if you would prefer a smaller maximum, since it would be within your range.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • kaymankayman Member Posts: 662 Unicorn

    I see, didn't notice that to be honnest...

    60 seems however like a fairly big number, the problem is that in my specific case the best value is typically below 20, and therefore it takes like 3 times the necessary amount of time to get the best prediction. In itself not such a big deal but when working with larger datasets it does make a nice difference.

     

    Any reason why 60 is chosen, is there a statistical story behind that number (pure out of interest) ?

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

    i've read the source code and there is no comment on why 60 is the min.. The class cites the paper: https://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf for the implementation. Not sure if there is some argument in this one.

     

    I will open up a ticket internally. If you desperatly need it you can extend the class and change the setting.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.