When k = 2, 3, 4?

tonyboy9tonyboy9 Member Posts: 113 Contributor II
Below is my customer segmentation data which I ran in AutoModel.

Below are screen shots when k = 2, 3 and 4. How can I tell which k is best?
I do not have access to the elbow method or silhouette analysis.
I looked at the three Davis-Bouldin indices which measure 5.415, 3.666 and 4.121.
Wikepedia calls this an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset. Due to the way it is defined, as a function of the ratio of the within cluster scatter, to the between cluster separation, a lower value will mean that the clustering is better.
Should I assume the index 3.666 means k = 3 is better?
Thanks for your time.


Best Answer

  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Solution Accepted
    The AutoModel does not perform cluster optimisation for k-Means, so if you run several experiments, the best distributed cluster model is the one which gives Davis-Bouldin measure closest to zero. However, if you select x-Means clustering it will return the optimum cluster number in between the specified range between minimum and maximum.


  • Options
    tonyboy9tonyboy9 Member Posts: 113 Contributor II
    Thank you for that. A follow up question, please. I need to interpret the k-means summary. I have no idea what the jumble of facts mean under each segment. How do I locate which segment has the problem attribute(s) I need to see. Given I now see the applicable segment, what does that mean in terms of problem solving? 
Sign In or Register to comment.