What is the relation between cluster size and centroid table? Which model makes more sense? Why?

NatalySimthNatalySimth Member Posts: 8 Contributor II
Hello folks,

I am working on comparing two results and I have them as below:

My question is : What is the relation between cluster size and centroid table? Which model makes more sense? Why? 
 (Case1):



(Case 2):

Best Answer

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    edited October 2019
    Hi @NatalySimth,

    Without any additionnal informations, to have a general idea, you can calculate the Average within centroid distance which measure the "compacity" of the clusters.(to compare the 2 models).
    For that, you have to put a Performance (Cluster Distance Performance) operator at the end of your process.

    Edit : 
    I wanted to correct /complete the explanation above : 
    Assuming that you are using K-means algorithm, a method to find the best k (number of cluster(s)), and thus the best model, is to plot the "Average within centroid distance" according to "k". You will obtain a curve like that (or in the opposite direction since the Average within centroid distance are negative in RapidMiner): 



    The best k and thus the more relevant model matches with the inflexion point of the curve.

    Hope this helps,


    Regards,

    Lionel 
  • NatalySimthNatalySimth Member Posts: 8 Contributor II
    Hey lionelderkrikor thanls for your explanation. if you allow me what do you mean with the "compacity" of the clusters?

    how can I create performance and Elbow? still new to all of these methods.
  • NatalySimthNatalySimth Member Posts: 8 Contributor II
    @lionelderkrikor Thanks a million! :) So useful information.
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @NatalySimth,

    You're welcome ! 

    Regards,

    Lionel
  • Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hi @lionelderkrikor

    thank you for your inspiring answer from above! In this sense, it should be also possible to generate the Ellbow by using the Davies-Bouldin index in order to compare the main criterion, right? 

    Thank you in advance for your answer! 

    Regards! 
  • tonyboy9tonyboy9 Member Posts: 113 Contributor II
    Better still, just return the Davies-Bouldin index to AutoModel results for k-means. Does anyone understand why this result was removed? I mean interpreting k-means results was already a challenge. 
  • prashant768prashant768 Member Posts: 6 Contributor I
    lionelderkrikor thanks for the explanation. 

    But can you please let me know how you get the inertia plot in rapidminer, as the options present in it are only avg within centroid and DB. 

    I want to plot it on the basis of inertia criterion. Please help 
Sign In or Register to comment.