"Average Distance within Cluster"

PinguiculaPinguicula Member Posts: 12 Contributor II
edited May 2019 in Help

here was premature comment which resolved into mist after some further literature review. And I'm unfortunately unable to remove my message.

Best Norbert



  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Norbert,

    I am no clustering expert myself but as far as I can see from the source code the calculation is roughly done as in the following pseudo code:

    count = 0;
    sum = 0.0;

    for each cluster C do {

    for each object O in C do {
                    distance = getDistanceFromCentroid(C, O);
                    sum = sum + v * v;


    result = sum / count;

    double divisionFactor = 1.0;
    if (getParameterAsBoolean(PARAMETER_NORMALIZE))
      divisionFactor = es.getAttributes().size();

    result = result / divisionFactor;

    Hope that helps. Maybe you did not take the normalization with the number of attributes into account?

  • Options
    PinguiculaPinguicula Member Posts: 12 Contributor II
    Hi Ingo,

    Your answer resolves somehow my problems.

    If my assumption is correct and in your pseudo code v is equivalent to distance the feature labelled average distance within cluster is actually the variance of the data points with the cluster and has little in common (exagerating)  ;) with the average distance within cluster used e.g. in the calculation of the Silhouette coefficient (Kaufman& Rousseeuw, 1990).

    By the way the Silhouette coefficient or the Hopkins statistic would be nice features in the next RM release.


Sign In or Register to comment.