Options

Search for best k in k-means, tried to follow help from Ingo

tonyboy9tonyboy9 Member Posts: 113 Contributor II

·      ·         IngoRM Posts: 1,750  RM Founder

February 2013

Hi,

in the "Samples" repository delivered together with RapidMiner you can find an example for creating the desired plot:

//Samples/processes/07_Clustering/09_KMeansWithPlot

It uses a parameter iteration for the number of clusters (k) and a Log operator for collecting the values for DB-Index (DB) and the average within cluster distance (W). The process log can then be inspected as a table or immediately plotted. I recommend the plot type "Scatter Multiple" with "k" on the x-axis and both "DB" and "W" on the y-axis. In the settings at the bottom you could even activate lines between the points simplifying the detection of the elbow.                                                                                                                   

I leave it to you to determine if 3, 4, or 5 clusters should be used in this case ;-)


Screen shot 1: I tried to build the process suggested by Ingo.

Can anyone out there help me with Ingo's suggestion.

Or provide another way to get to k.

Thank you for your time.

Tony



Screen shot 2, I tried to execute the process:






 

 

 


Best Answer

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi,

    the sample processes are meant to be taken as a template, not as a ready-to-use subprocess that you use as a blackbox. 

    Open the sample process, save it into your repository with a new name, and use your data there. 

    Regards,
    Balázs
Sign In or Register to comment.