How to evaluate clustering

ahootanhaahootanha Member Posts: 69 Contributor I
edited December 2018 in Help

Hello
I want to compare clusters and evaluate which operators should I use?
And
How do I find the optimal parameters for each clustering method?
Thanks

Answers

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research

    Hi,

     

    finding optimal settings for clustering is indeed a bit tricky.

     

    But RapidMiner offers performance measures for clustering or segmentation tasks.

    In the Operator list under Validation -> Segmentation you'll find the corresponding Operators.
    If you have a subset of your data, where you exactly know into which cluster each example belongs, you can also try to set the cluster Attribute as a prediction and optimize the classification performance instead.cluster_performance.png

     

     

     

    Best,
    David

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello
    Concept of
    avg within centroid distance -1.0876
    davies bouldin -5.675

    What is?

  • ahootanhaahootanha Member Posts: 69 Contributor I

    I used Silhouette
    What do these results show?
    Please guide
    Thanks

    مهم.JPGمهم۲.JPG

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research

    Hi again,

     

    I guess the Silhoutte performance comes from a 3rd party extension, so I can't say much about it. But wikipedia has an entry about it:

    https://en.wikipedia.org/wiki/Silhouette_(clustering)

    In short it messaures how similar an Example is to the rest of the cluster. The value is normed between -1 and +1 and a high value indicates a higher similarity.

     

    The Davies–Bouldin criterion is also quite good explained in wikipedia:

    https://en.wikipedia.org/wiki/Davies%E2%80%93Bouldin_index

    The idea is to maximise the inter-cluster distance (the different between the different clusters) and minimize inter-cluster distances (the points within each cluster should be close together).  Here a lower index is better.

     

     

    Best,
    David

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Hello
    Many thanks
    Criterion
    AVG within centroid distance -1.043
    What is?
    What does the Silhouette of each cluster show in the first photo?

Sign In or Register to comment.