optimal number of clusters in fuzzy c-means

farzanefarzane Member Posts: 6 Learner I
edited August 2020 in Help
I'm using fuzzy c-means to cluster a few text data. How can I find the optimal number of clusters? is intar_cluster_distance a good measure? 


  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    edited August 2020
    I assume that you are talking about Fuzzy C-Means operator from the Information Selection extension? The key to finding an optimum k is create an optimisation loop, e.g. using Optimize Parameters (Grid), which could vary the cluster numbers vs some performance measure.

    If you are interested only in the final cluster allocation then we have lots of possible solutions for you. However, as Fuzzy C-Means is not returning the centroid table (such as k-Means), you will not be able to use Davis-Bouldin measurement from Cluster Distance Performance. However, you can rely on the commonly used Item Distribution Performance (e.g. Sum of Squares measure) and plot it against k to use the "elbow method" of finding the "optimum" cluster number. Alternatively, you could use a combination of Data to Similarity and Cluster Density Performance to optimise the average cluster density.

    Note however that the whole idea of using Fuzzy C-Means to utilise the fuzzy membership of examples in each cluster. If this was the aim to consider all possible cluster memberships then there are no obvious performance measures available in RapidMiner, you could create your own measure by weighing different clustering performance indicators with cluster membership confidence factors.

    Information Selection extension also provides two performance operators worth investigating here - one is calculating within cluster distance variance, unfortunately it does not take into consideration the fuzzy cluster membership.

  • Options
    farzanefarzane Member Posts: 6 Learner I
    Thank you so much. the problem has been solved :)
  • Options
    endirizalfendirizalf Member Posts: 1 Contributor I
    Hi, @farzane
    which solution did you use? can you explain to me, please?? 

    you can mention me in this discussion or send to my email endirizal.f@gmail.com.

    thankyou for your help


Sign In or Register to comment.