Options

# X-Means always same behavior

nelsonthekinger
Member Posts:

**5**Contributor II
Hello Experts!

I'm trying to use X-means due to its advantages against K-means, but Im not getting the proper result.

I tried K-means to evaluate 6 files from 3 categories and with a k = 3 it worked perfectly.

than i try to apply the Xmeans from 2 to 60 and I get always 2 clusters.

I though it could be because of having few files so I tried again with 53 files from 3 categories,

and the result were the same. K-means(k=3) successful, X-means (k = 2 - 60) the same 2 clusters.

I've tried many configurations but the most use are

measure type: NumericalMeasures

numerical measure: CosineSimilarity

clustering algoritm: KMeans

the rest is default.

I'm Clueless about the reason any help is appreciate!

I'm trying to use X-means due to its advantages against K-means, but Im not getting the proper result.

I tried K-means to evaluate 6 files from 3 categories and with a k = 3 it worked perfectly.

than i try to apply the Xmeans from 2 to 60 and I get always 2 clusters.

I though it could be because of having few files so I tried again with 53 files from 3 categories,

and the result were the same. K-means(k=3) successful, X-means (k = 2 - 60) the same 2 clusters.

I've tried many configurations but the most use are

measure type: NumericalMeasures

numerical measure: CosineSimilarity

clustering algoritm: KMeans

the rest is default.

I'm Clueless about the reason any help is appreciate!

0

## Answers

5Contributor II1Contributor II have the same problem. I tried x-means with kmin=2 and kmax=60 and for my data the right result is 4 klusters, xmeans worked and give a result - 2 klusters. And the same result for different data that i tried.

Who can help me?)

93Maveni tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) in each time. Is it normal that x-Means always picks the minimal number of k?

Best regards!

3,517RM Data ScientistDortmund, Germany

8Contributor IIThe situation you stated can happen if you don't have too many examples for clustering, or they are simply too similar to one another so the X-means always resorts to the simplest clustering scheme.

In such case it is better to normalize the data beforehand. This will ensure all the attributes arrives at the same scale before the algorithm is applied.

For e.g. attribute1 has data range 0-100 and attribute2 has vector range 0-1. Now in this case attribute1 gets more weightage than attribute2. But if you apply normalise both attributes will covert to 0-1 scale.

Rapidminer Operator to be used : "Normalize"