Clustering k-means

3erthe3er · February 5

Hello everyone,

I am looking for a way to cluster data. With the tools I am using, I cannot directly find the right number of k, so the data is put into the number of clusters I have set k to.

Is there any way/tool I can find the right number of clusters without knowing it beforehand?

And what kind of function should I use to check the result? / to check the robustness?

I have read that the X-means cluster attribute should help to find the right number of clusters.

I see a display on the right-hand side that makes an "assumption", but in my case this is incorrect and does not match the data set.

Surely there must be an iterative/mathematical function that solves this problem?

To clarify once again, the number of clusters into which my data set is clustered after the analysis is kmin. I am looking for an automatic method to find the right number of k.

Maybe my selection of attributes is wrong?

Image: https://us.v-cdn.net/6030995/uploads/editor/oa/79uig1qmohyd.png

Thanks to everyone for the help. I appreciate it very much!

P.S Perhaps k means is also not the right choice?

Any help is very much appreciated!! 😊

MartinLiebig · February 5

Hi there,

finding the number of clusters for a clustering algorithm is somewhat its toughest part.

XMeans is already a way how to get a good estimate for k. There are some heuristics out there, most prominently the Ellbow method. But there is even a paper argueing you shouldn't use it: https://arxiv.org/pdf/2212.12189.pdf

Also be careful with the normalization of your data. I see you do not use a normalize operator so it might create results you don't want. Same for the one-hot encoding you use.

BR,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Clustering k-means

Answers