The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Average Distance within Cluster"

PinguiculaPinguicula Member Posts: 12 Contributor II
edited May 2019 in Help
Sorry,

here was premature comment which resolved into mist after some further literature review. And I'm unfortunately unable to remove my message.

Best Norbert

Tagged:

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Norbert,

    I am no clustering expert myself but as far as I can see from the source code the calculation is roughly done as in the following pseudo code:

    count = 0;
    sum = 0.0;

    for each cluster C do {

    for each object O in C do {
                    distance = getDistanceFromCentroid(C, O);
                    sum = sum + v * v;
                    count++;
            }

    }

    result = sum / count;

    double divisionFactor = 1.0;
    if (getParameterAsBoolean(PARAMETER_NORMALIZE))
      divisionFactor = es.getAttributes().size();

    result = result / divisionFactor;

    Hope that helps. Maybe you did not take the normalization with the number of attributes into account?

    Cheers,
    Ingo
  • PinguiculaPinguicula Member Posts: 12 Contributor II
    Hi Ingo,

    Your answer resolves somehow my problems.

    If my assumption is correct and in your pseudo code v is equivalent to distance the feature labelled average distance within cluster is actually the variance of the data points with the cluster and has little in common (exagerating)  ;) with the average distance within cluster used e.g. in the calculation of the Silhouette coefficient (Kaufman& Rousseeuw, 1990).

    By the way the Silhouette coefficient or the Hopkins statistic would be nice features in the next RM release.

    Best

    Norbert
Sign In or Register to comment.