Clustering Performance (Example Distribution)
Hi to all,
I'm running a clustering model in Rapidminer. I use KMedoids. One of my performance indicators is the "Item Performance Distribution" operator. This operator uses the Sum of Squares measure. As a result, I get: Example distribution: 0.272
What does this number indicate. I can not interpret if the higher or lower values are good for my model?
I'm running a clustering model in Rapidminer. I use KMedoids. One of my performance indicators is the "Item Performance Distribution" operator. This operator uses the Sum of Squares measure. As a result, I get: Example distribution: 0.272
What does this number indicate. I can not interpret if the higher or lower values are good for my model?
Tagged:
0
Best Answer

jacobcybulski Member, University Professor Posts: 391 UnicornSum of squares (also called "within sum of squares" or WSS) is a measure of cluster members distribution around the centroid (in your case medoid). It is calculated as the average sum of squared differences of cluster members and their cluster centroid. It is similar to variance. The value returned by the performance indicator is not useful on its own. It is used primarily to compare one clustering systems against another, e.g. during the cluster optimisation. The intuition tells us that the smaller WSS, the denser are the clusters, i.e. the distance between cluster members is small and presumably the distance between centroids is large. This intuition clearly fails when the number of clusters is equal to the number of data points (N), where the WSS is zero but the clustering is not very good at all. So in general if we consider all cluster numbers k from 1 to N, we calculate WSS(k), hoping to find such k beyond which the gain in WSS is no longer significant as compared with the complexity of the clustering system (as represented by k). If you chart k vs WSS(k), the chart looks like a bent arm, and so we are looking for an elbow at which we usually find that "best" k. Do not be hooked on an idea that the optimum must be a sharp elbow, the optimum is open to your interpretation as to what you consider significant improvement in WSS, and when k is too large for you to deal with.1