ANNOUNCEMENT: RAPIDMINER 9.1 BETA HAS BEEN RELEASED TODAY!   PLEASE DOWNLOAD AND GIVE FEEDBACK. ENJOY AND HAPPY RAPIDMINING!   -- @sgenzer – Community Manager

How can I validate a DBSCAN clustering using only internal criteria?

agucaba123agucaba123 Member Posts: 3 Contributor I
edited November 10 in Help

Hello, I'm trying to do a validation of different clustering models using ONLY internal criteria. With centroid-based clustering, like K-means and K-medoid, I used DB index and an extension that evaluates the silhouette index. My problem is that DB and silhouette indexs are not available for DBSCAN, and the others operators of RapidMiner Studio like density, or item distrubution make no sense to me in this case.

 

I saw this post, but I couldn't find an answer: https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Cluster-Performance-DBScan-and-agglomerative-Clustering/m-p/40748#M27683

By the way, I readed that in previous versions of RapidMiner existed an operator called "Cluster internal validation". https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Cannot-find-the-cluster-internal-validation-operator-in-rapid/m-p/25745

Is this operator still available? 

 

Tagged:

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 530   Unicorn

    Hi @agucaba123,

     

    I'm not aware of an operator called "Cluster internal validation".

    However, you can eventually calculate the Silhouette Coefficient using a Python script.

    If you are interested in, can you share your dataset and your process in order to see if it's possible.

     

    Regards,

     

    Lionel

    agucaba123
  • agucaba123agucaba123 Member Posts: 3 Contributor I

    Hi Lionel. I can't share the dataset but I tried to apply a Silhouette coeficient and the result was this:

     

    DBSCAN.png

     

    I looped the epsilon parameter between 0,1 and 2. The MinsPoints were defined as 5, 10 and 20. What does it means the Silhouette index in each case? Is it useful for validation in this clustering method? Because when the epsilon parameter rises, the segmentation is worse (the numbers under the value of epsilon are the sizes of the clusters)

     

    Thanks for your time. 

Sign In or Register to comment.