# Minimal k for x-Means?

Muhammed_Fatih_
Member Posts:

**93**Maven
in Help

Dear community,

my question looks like the following: Does x-means always take the minimal given k as optimum?

I tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60 on my data. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) every time. Is it normal that x-Means always picks the minimal number of k?

Best regards!

my question looks like the following: Does x-means always take the minimal given k as optimum?

I tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60 on my data. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) every time. Is it normal that x-Means always picks the minimal number of k?

Best regards!

Tagged:

0

## Answers

8Contributor IIThe situation you stated can happen if you don't have too many examples for clustering, or they are simply too similar to one another so the X-means always resorts to the simplest clustering scheme.

In such case it is better to normalize the data beforehand. This will ensure all the attributes arrives at the same scale before the algorithm is applied.

For e.g. attribute1 has data range 0-100 and attribute2 has vector range 0-1. Now in this case attribute1 gets more weightage than attribute2. But if you apply normalise both attributes will covert to 0-1 scale.

Rapidminer Operator to be used : "Normalize"

93Maventhank you for your response. I tried the "Normalize" operator. But it doesn't help. I got the same result as before - hence, the x-means operator again picked the given k-min parameter. I don't know if this is a "normal" behaviour of x-means.

Does anyone have any other opinions?

Best regards!

578Unicorn93MavenThank you for your answer.

Does this mean that X-means or rather AIC/BIC penalties that are implemented in the corresponding operator are only able to operate on specific datasets? What does "It really comes down to your dataset." mean in detail?

Best regards!