Minimal k for x-Means?

Muhammed_Fatih_ · May 2020

Dear community,

my question looks like the following: Does x-means always take the minimal given k as optimum?

I tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60 on my data. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) every time. Is it normal that x-Means always picks the minimal number of k?

Image: https://us.v-cdn.net/6030995/uploads/editor/si/4fv0ojpckc8j.jpg

Best regards!

mantanz · May 2020

If possible please share your xml and let me know the number of examples in your data set.

The situation you stated can happen if you don't have too many examples for clustering, or they are simply too similar to one another so the X-means always resorts to the simplest clustering scheme.
In such case it is better to normalize the data beforehand. This will ensure all the attributes arrives at the same scale before the algorithm is applied.
For e.g. attribute1 has data range 0-100 and attribute2 has vector range 0-1. Now in this case attribute1 gets more weightage than attribute2. But if you apply normalise both attributes will covert to 0-1 scale.

Rapidminer Operator to be used : "Normalize"

Muhammed_Fatih_ · May 2020

Hi @mantanz,

thank you for your response. I tried the "Normalize" operator. But it doesn't help. I got the same result as before - hence, the x-means operator again picked the given k-min parameter. I don't know if this is a "normal" behaviour of x-means.

Does anyone have any other opinions?

Best regards!

JEdward · May 2020

@Muhammed_Fatih_ It really comes down to your dataset. Try different datasets (for example the Iris dataset from the Samples folder in your RM Studio) and you'll see some of them will get different values for X.

Muhammed_Fatih_ · May 2020

@JEdward

Thank you for your answer.

Does this mean that X-means or rather AIC/BIC penalties that are implemented in the corresponding operator are only able to operate on specific datasets? What does "It really comes down to your dataset." mean in detail?

Best regards!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Minimal k for x-Means?

Answers