RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

cluster performance operators..what does different value mean?

MonaMona Member Posts: 2 Contributor I
edited February 14 in Help
I have to check performance of various clustering algos using different performance operators. For that I want to know the following things:

1. what does cluster number index value shows which is output of cluster count performance operator?
2. what does small and large value of avg within cluster distance and avg. within centroid distance mean in terms of good and bad clustering?
3. I also want to check other indexes value like Dunn index,Jaccard index, Fowlkes–Mallows for various clustering algos. but rapidminer don't have any operator for this, what to do for that. I don't have experience with R.



Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458   Unicorn
    Hello

    As it happens, I am presenting a short paper at the RComm13 conference which has some overlap with your questions so I'm sure you will understand if I don't answer directly.


    The cluster number index is the count of clusters - pointless you might say but when used with DBSCAN, it can be quite interesting http://rapidminernotes.blogspot.co.uk/2010/12/counting-clusters.html

    The avg within cluster and centroid distances are hard to interpret - one thing to search for is "elbow criterion" in this context. As the number of clusters varies, note how the validity measure changes and look for an "elbow" that marks the point where the natural progression of the measure dominates the structure.

    R has many validity measures and it's worth investing some time because you can always call the R process from RapidMiner which makes it easier to work out what is going on.

    Andrew


    regards

    Andrew
Sign In or Register to comment.