Options

# Computations for Cluster Distance Performance operator

I am having trouble replicating the computations of the "

The operator documentation states - "avg._within_centroid_distance: The average within cluster distance is calculated by averaging the distance between the centroid and all examples of a cluster." The term "avg_within_centroid_distance" seems confusing to me because the definition is actually stating that it is "avg_within_cluster_distance" which are two different concepts altogether. Also, it is not clear how the overall "avg._within_centroid_distance" is computed in addition to the metric computed for each cluster.

I have attached the sample calculations for the Iris dataset along with the RapidMiner process. I was able to replicate the Davies Bouldin index but not the "avg._within_centroid_distance". Any help would be much appreciated.

On a related note, it is also not clear to me what the

**avg. within cluster distance**" metrics produced by the**Performance (Cluster Distance Performance)**operator.The operator documentation states - "avg._within_centroid_distance: The average within cluster distance is calculated by averaging the distance between the centroid and all examples of a cluster." The term "avg_within_centroid_distance" seems confusing to me because the definition is actually stating that it is "avg_within_cluster_distance" which are two different concepts altogether. Also, it is not clear how the overall "avg._within_centroid_distance" is computed in addition to the metric computed for each cluster.

I have attached the sample calculations for the Iris dataset along with the RapidMiner process. I was able to replicate the Davies Bouldin index but not the "avg._within_centroid_distance". Any help would be much appreciated.

On a related note, it is also not clear to me what the

**Performance (Cluster Density Performance)**operator is computing and how. I did read the operator documentation but it did not make sense to me.
Tagged:

0

## Answers

49Mavensquared Euclidean distancebetween each observation and the corresponding centroid, not the Euclidean distance.If someone could clarify what the

Performance (Cluster Density Performance)operator is computing, that would help. Thanks.