how to assess the "pureness" of clusters (e.g k-means) with labeled data?
I want to test clustering, I can assess the performance of clusters with operator "map clustering to labels" , but this only works if my size of clusters is equal to the number of labels...
If I try different k's with k-means, is there a way to assess the goodness of clusters with some validity measure like pureness of a cluster (or sum of all cluster purenesses?), based on the label distribution in a cluster?
I could of course look at the distribution of labels in every cluster, but is there some overview that gives me that performance of pureness?