Options

# optimal number of clusters in fuzzy c-means

Hi

I'm using fuzzy c-means to cluster a few text data. How can I find the optimal number of clusters? is intar_cluster_distance a good measure?

I'm using fuzzy c-means to cluster a few text data. How can I find the optimal number of clusters? is intar_cluster_distance a good measure?

Tagged:

1

## Answers

391UnicornIf you are interested only in the final cluster allocation then we have lots of possible solutions for you. However, as Fuzzy C-Means is not returning the centroid table (such as k-Means), you will not be able to use Davis-Bouldin measurement from Cluster Distance Performance. However, you can rely on the commonly used Item Distribution Performance (e.g. Sum of Squares measure) and plot it against k to use the "elbow method" of finding the "optimum" cluster number. Alternatively, you could use a combination of Data to Similarity and Cluster Density Performance to optimise the average cluster density.

Note however that the whole idea of using Fuzzy C-Means to utilise the fuzzy membership of examples in each cluster. If this was the aim to consider all possible cluster memberships then there are no obvious performance measures available in RapidMiner, you could create your own measure by weighing different clustering performance indicators with cluster membership confidence factors.

Information Selection extension also provides two performance operators worth investigating here - one is calculating within cluster distance variance, unfortunately it does not take into consideration the fuzzy cluster membership.

Jacob

6Learner IThank you so much. the problem has been solved

1Contributor Iwhich solution did you use? can you explain to me, please??

you can mention me in this discussion or send to my email endirizal.f@gmail.com.

thankyou for your help

Endirizalf