How to extract distinct features of K-Means Cluster?

eldenosoeldenoso Member Posts: 65 Contributor I
edited December 2018 in Help

Hello altogether,

since I'm doing some cluster-analysis, I am mainly interested in the features of each cluster. How can each cluster be described by it's attributes? 

When I think about a marketing-case, it's not enough to just cluster your customers. You also have to know how to treat each group, therefore you have to know what the main features are.

Is there a way to extract them from the K-Means algorithm or is there even a better approach to this?

Thanks in advance :)



  • Options
    kershovkershov Member Posts: 9 Contributor I



    I think Extract Cluster Prototypes operator can help you/

  • Options
    eldenosoeldenoso Member Posts: 65 Contributor I

    Thank you for your answer @kershov!

    But I think thats not exactly what I searched for, since the Prototypes don't really describe the clusters. E.g. when you plot the cluster you see that main group is in germany, but the prototype says it is norway, which seems contrary. 

    Is there another way to get features extracted? In a decision tree for example it is easier to identify the important features.

    Thank you

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Hi there, you have a couple of options to this common question.  

     You could turn your clusters into labels and then attempt to diagnose them using predictive modeling algorithms, using simple classifiers such as Naive Bayes or Decision Trees.  

    If you already have labels (not the clusters themselves) then you could use "Map Clustering on Labels" and do something similar.  Or run a predictive model using only the cluster attribute against your existing labels.

    You can also use the centroid output from clusters to determine which attributes score highly for a given cluster but not for other clusters.  You could even use "Generate Attributes" to define a new metric of the difference in centroid values between one cluster and another.

    You might also want to search through the forum on this topic since there are many existing threads that are related, and they might give you even more ideas.  Here's one, for example: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Cluster-Performance-DBScan-and-agglomerative-Clustering/m-p/40754#M27689

    I hope this helps!






    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.