🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Computations for Cluster Distance Performance operator

avdavd Member, University Professor Posts: 44  Maven
edited November 21 in Help
I am having trouble replicating the computations of the "avg. within cluster distance" metrics produced by the Performance (Cluster Distance Performance) operator.

The operator documentation states - "avg._within_centroid_distance: The average within cluster distance is calculated by averaging the distance between the centroid and all examples of a cluster." The term "avg_within_centroid_distance" seems confusing to me because the definition is actually stating that it is "avg_within_cluster_distance" which are two different concepts altogether. Also, it is not clear how the overall "avg._within_centroid_distance" is computed in addition to the metric computed for each cluster.

I have attached the sample calculations for the Iris dataset along with the RapidMiner process. I was able to replicate the Davies Bouldin index but not the "avg._within_centroid_distance". Any help would be much appreciated.

On a related note, it is also not clear to me what the Performance (Cluster Density Performance) operator is computing and how. I did read the operator documentation but it did not make sense to me.


Answers

  • avdavd Member, University Professor Posts: 44  Maven
    I figured out that the "avg._within_centroid_distance" computes the average of the squared Euclidean distance between each observation and the corresponding centroid, not the Euclidean distance. 

    If someone could clarify what the Performance (Cluster Density Performance) operator is computing, that would help. Thanks.
Sign In or Register to comment.