X-Means always same behavior

nelsonthekingernelsonthekinger Member Posts: 5 Contributor II
edited November 2018 in Help
Hello Experts!

I'm trying to use X-means due to its advantages against K-means, but Im not getting the proper result.
I tried K-means to evaluate 6 files from 3 categories and with a k = 3 it worked perfectly.
than i try to apply the Xmeans from 2 to 60 and I get always 2 clusters.

I though it could be because of having few files so I tried again with 53 files from 3 categories,
and the result were the same. K-means(k=3) successful, X-means (k = 2 - 60) the same 2 clusters.

I've tried many configurations but the most use are
measure type: NumericalMeasures
numerical measure: CosineSimilarity
clustering algoritm: KMeans
the rest is default.

I'm Clueless about the reason any help is appreciate!


  • Options
    nelsonthekingernelsonthekinger Member Posts: 5 Contributor II
  • Options
    bigbangtwobigbangtwo Member Posts: 1 Contributor I
    I have the same problem. I tried x-means with kmin=2 and kmax=60 and for my data the right result is 4 klusters, xmeans worked and give a result - 2 klusters. And the same result for different data that i tried.
    Who can help me?)
  • Options
    Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hello everyone, 

    i tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) in each time. Is it normal that x-Means always picks the minimal number of k? 

    Best regards!  
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    did you normalize before?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    mantanzmantanz Member Posts: 8 Contributor II
    If possible please share your xml and let me know the number of examples in your data set.

    The situation you stated can happen if you don't have too many examples for clustering, or they are simply too similar to one another so the X-means always resorts to the simplest clustering scheme.
    In such case it is better to normalize the data beforehand. This will ensure all the attributes arrives at the same scale before the algorithm is applied.
    For e.g. attribute1 has data range 0-100 and attribute2 has vector range 0-1. Now in this case attribute1 gets more weightage than attribute2. But if you apply normalise both attributes will covert to 0-1 scale.

    Rapidminer Operator to be used : "Normalize"
Sign In or Register to comment.