Clustering Performance (Example Distribution)

User36964User36964 Member, University Professor Posts: 15 University Professor
Hi to all,
I'm running a clustering model in Rapidminer. I use K-Medoids. One of my performance indicators is the "Item Performance Distribution" operator. This operator uses the Sum of Squares measure. As a result, I get: Example distribution: 0.272

What does this number indicate. I can not interpret if the higher or lower values are good for my model? 

Best Answer

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Solution Accepted
    Sum of squares (also called "within sum of squares" or WSS) is a measure of cluster members distribution around the centroid (in your case medoid). It is calculated as the average sum of squared differences of cluster members and their cluster centroid. It is similar to variance. The value returned by the performance indicator is not useful on its own. It is used primarily to compare one clustering systems against another, e.g. during the cluster optimisation. The intuition tells us that the smaller WSS, the denser are the clusters, i.e. the distance between cluster members is small and presumably the distance between centroids is large. This intuition clearly fails when the number of clusters is equal to the number of data points (N), where the WSS is zero but the clustering is not very good at all. So in general if we consider all cluster numbers k from 1 to N, we calculate WSS(k), hoping to find such k beyond which the gain in WSS is no longer significant as compared with the complexity of the clustering system (as represented by k). If you chart k vs WSS(k), the chart looks like a bent arm, and so we are looking for an elbow at which we usually find that "best" k. Do not be hooked on an idea that the optimum must be a sharp elbow, the optimum is open to your interpretation as to what you consider significant improvement in WSS, and when k is too large for you to deal with.
Sign In or Register to comment.