"Miscellaneous Issues Related to Clustering"
Since im neither mathematician nor a computer scientist the answer to the following question might be quite simple but I'm still a little bit confused about the Clustering algorithms in RM:
A) Is it normal behaviour of the Kmeans algorithm that it needs much more time (at least 10x) if the "add characterization" button is switched on?
Is DBscan the only density based algorithm currently implemented in RM?
C) As far as I understand the Kmeans algorithm should be capable of producing clusters of different cardinality. However, in my datasets the output clusters differ only slightly in their cardinality. Size of the largest cluster at most 5 or 6 times the size of the smallest one. Is this more likely to be a characteristic of the dataset or an artefact of the algorithm?
D) Using the ClusterCentroidEvaluator, the output indicates negative average distances? Is it possible? Or just ignore the sign?
E) Are there performance vector in order to evaluate the pairwaise similarity / overlap between clusters produced by kmeans? Can I manipulate the output of kmeans in a way that the ClusterDensityEvaluator and the ItemDistributionEvaluator accept it as an input?
F) Is there any particular reason why the Ward method is not implemented as clustering algorithm in hierarchical cluster models? (it is still quite often used in the publications in my discipline)