Compete in RapidMiner's 3rd Competition: Fantasy Football. Top prize is $750. Deadline December 19.
Download RapidMiner Studio or Server 8.0 Public Beta. Let us know how you like it! Ends November 27.
Watch RapidMiner's "Getting Started" videos on YouTube. Everything you need to do data science - fast and simple!
since I'm doing some cluster-analysis, I am mainly interested in the features of each cluster. How can each cluster be described by it's attributes?
When I think about a marketing-case, it's not enough to just cluster your customers. You also have to know how to treat each group, therefore you have to know what the main features are.
Is there a way to extract them from the K-Means algorithm or is there even a better approach to this?
Thanks in advance
Thank you for your answer @kershov!
But I think thats not exactly what I searched for, since the Prototypes don't really describe the clusters. E.g. when you plot the cluster you see that main group is in germany, but the prototype says it is norway, which seems contrary.
Is there another way to get features extracted? In a decision tree for example it is easier to identify the important features.
Hi there, you have a couple of options to this common question.
You could turn your clusters into labels and then attempt to diagnose them using predictive modeling algorithms, using simple classifiers such as Naive Bayes or Decision Trees.
If you already have labels (not the clusters themselves) then you could use "Map Clustering on Labels" and do something similar. Or run a predictive model using only the cluster attribute against your existing labels.
You can also use the centroid output from clusters to determine which attributes score highly for a given cluster but not for other clusters. You could even use "Generate Attributes" to define a new metric of the difference in centroid values between one cluster and another.
You might also want to search through the forum on this topic since there are many existing threads that are related, and they might give you even more ideas. Here's one, for example: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Cluster-Performance-DBScan-and-agglomerat...
I hope this helps!