Options

"Average Distance within Cluster"

PinguiculaPinguicula Member Posts: 12 Contributor II
edited May 2019 in Help
Sorry,

here was premature comment which resolved into mist after some further literature review. And I'm unfortunately unable to remove my message.

Best Norbert

Tagged:

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Norbert,

    I am no clustering expert myself but as far as I can see from the source code the calculation is roughly done as in the following pseudo code:

    count = 0;
    sum = 0.0;

    for each cluster C do {

    for each object O in C do {
                    distance = getDistanceFromCentroid(C, O);
                    sum = sum + v * v;
                    count++;
            }

    }

    result = sum / count;

    double divisionFactor = 1.0;
    if (getParameterAsBoolean(PARAMETER_NORMALIZE))
      divisionFactor = es.getAttributes().size();

    result = result / divisionFactor;

    Hope that helps. Maybe you did not take the normalization with the number of attributes into account?

    Cheers,
    Ingo
  • Options
    PinguiculaPinguicula Member Posts: 12 Contributor II
    Hi Ingo,

    Your answer resolves somehow my problems.

    If my assumption is correct and in your pseudo code v is equivalent to distance the feature labelled average distance within cluster is actually the variance of the data points with the cluster and has little in common (exagerating)  ;) with the average distance within cluster used e.g. in the calculation of the Silhouette coefficient (Kaufman& Rousseeuw, 1990).

    By the way the Silhouette coefficient or the Hopkins statistic would be nice features in the next RM release.

    Best

    Norbert
Sign In or Register to comment.