"k-means Clustering which data belongs to which cluster?"

CarloCarlo Member Posts: 12 Learner I
edited May 2019 in Help
Hi Community,

I would like to cluster countries due to several factors like:  purchasing power, competition, turnover, Ease of doing business, tariffs, political stability etc. etc.
I am creating an Input list with the aim to have a numerical value for each and every factor (that makes it easier to cluster).
As Output I would like to have (let's say for example) 3 cluster and I would like to see which country belongs to wich cluster...
I am working currently with the k-means operator which works quite well but I am not able to see which country belongs to which cluster....

Does anybody has a suggestions?

Thanks a head.

Best regards,


Best Answers

  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    Solution Accepted
    Hi @Carlo,
    We can convert the region codes from nominal to dummy coding (nominal to numerical operator) and then multiply the region dummy code by 3, or multiply by 5 to change the range of the numerical region attributes to [0,5]. You would also need to apply some normalization on the other columns: purchasing power, competition, turnover, Ease of doing business, tariffs, political stability to make sure these normalized attributes have a smaller range, saying [0.1]. K-NN model with Chebyshev distance will take the region factor as the most important one since distance based clustering models are always sensitive to normalization. This kind of human-interference will increase the weight on region factor. You would need some testing on the multiply factor for region. To  get guaranteed results, fitting several clustering models on the subset for each region would be ideal.


  • Options
    CarloCarlo Member Posts: 12 Learner I
    edited March 2019
    that is great! Works perfect! Thanks for your hint.

    One very last question I would have regarding this topic.
    In my data input I have countries from all over the world, but I should only cluster within several regions p.e. americas, apac, emea. So my output should be 2 clusters per region.

    My solution was: I splitted my input data ahead, before bringing it to rapidminer as a repository. So I have three repositories and I performed then the clustering with each of them.

    Is there the possibility to give rapidminer the hint to cluster only those countries togehter wich belongs to the same region (region is named in column b)?

    Thanks and best regards,
Sign In or Register to comment.