RapidMiner

Highlighted
Contributor II eldenoso
Contributor II

How to extract distinct features of K-Means Cluster?

Hello altogether,

since I'm doing some cluster-analysis, I am mainly interested in the features of each cluster. How can each cluster be described by it's attributes? 


When I think about a marketing-case, it's not enough to just cluster your customers. You also have to know how to treat each group, therefore you have to know what the main features are.

Is there a way to extract them from the K-Means algorithm or is there even a better approach to this?

Thanks in advance Smiley Happy

3 REPLIES
Learner III kershov
Learner III

Re: How to extract distinct features of K-Means Cluster?

Hello!

 

I think Extract Cluster Prototypes operator can help you/

Contributor II eldenoso
Contributor II

Re: How to extract distinct features of K-Means Cluster?

Thank you for your answer @kershov!

But I think thats not exactly what I searched for, since the Prototypes don't really describe the clusters. E.g. when you plot the cluster you see that main group is in germany, but the prototype says it is norway, which seems contrary. 

Is there another way to get features extracted? In a decision tree for example it is easier to identify the important features.

Thank you

RM Certified Expert
RM Certified Expert

Re: How to extract distinct features of K-Means Cluster?

Hi there, you have a couple of options to this common question.  

 You could turn your clusters into labels and then attempt to diagnose them using predictive modeling algorithms, using simple classifiers such as Naive Bayes or Decision Trees.  

If you already have labels (not the clusters themselves) then you could use "Map Clustering on Labels" and do something similar.  Or run a predictive model using only the cluster attribute against your existing labels.

You can also use the centroid output from clusters to determine which attributes score highly for a given cluster but not for other clusters.  You could even use "Generate Attributes" to define a new metric of the difference in centroid values between one cluster and another.

You might also want to search through the forum on this topic since there are many existing threads that are related, and they might give you even more ideas.  Here's one, for example: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Cluster-Performance-DBScan-and-agglomerat...

I hope this helps!

 

 

 

 

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed