Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Minimal k for x-Means?
Muhammed_Fatih_
Member Posts: 93 Maven
in Help
Dear community,
my question looks like the following: Does x-means always take the minimal given k as optimum?
I tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60 on my data. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) every time. Is it normal that x-Means always picks the minimal number of k?
Best regards!
my question looks like the following: Does x-means always take the minimal given k as optimum?
I tried X-Means between the interval k-min=2 and k-max=60 as well as with k-min=20 and k-max=60 on my data. The x-means model gives me the minimal number of k (in the first time k=2 and in the second time k=20) every time. Is it normal that x-Means always picks the minimal number of k?
Best regards!
Tagged:
0
Answers
The situation you stated can happen if you don't have too many examples for clustering, or they are simply too similar to one another so the X-means always resorts to the simplest clustering scheme.
In such case it is better to normalize the data beforehand. This will ensure all the attributes arrives at the same scale before the algorithm is applied.
For e.g. attribute1 has data range 0-100 and attribute2 has vector range 0-1. Now in this case attribute1 gets more weightage than attribute2. But if you apply normalise both attributes will covert to 0-1 scale.
Rapidminer Operator to be used : "Normalize"
thank you for your response. I tried the "Normalize" operator. But it doesn't help. I got the same result as before - hence, the x-means operator again picked the given k-min parameter. I don't know if this is a "normal" behaviour of x-means.
Does anyone have any other opinions?
Best regards!
Thank you for your answer.
Does this mean that X-means or rather AIC/BIC penalties that are implemented in the corresponding operator are only able to operate on specific datasets? What does "It really comes down to your dataset." mean in detail?
Best regards!