"unsupervised cluster evaluation"

nguyenxuanhau · July 2010

Hi!
can I compare unsupervised cluster evaluations(clustermodel evaluation) each other on unlabeled data on RM?
what must I do to compare unsupervised cluster evaluations (clustermodel evaluation)each other on unlabeled data in RM?
Best regard

land · July 2010

Hi,
if you want to compare how much two cluster outcomes match each other, you can simply rename the first one and assign it the role "label" before actually performing the second evaluation. If you then would set the role of the second cluster attribute to prediction, you can use standard accuracy measure to measure the equality.

Greetings,
Sebastian

nguyenxuanhau · July 2010

Hi
Please detail do that, how do the method chose the best cluster on my data ( my data is large but unlabled)?
Bestregard

land · July 2010

Hi,
it doesn't. How could you know what is the best cluster? Guessing?
There are some cluster evaluation heuristics available, but as their name says: They are just heuristics.

Greetings,
Sebastian

nguyenxuanhau · July 2010

so, to compare two clustering algorithms how do i do? to know a algorithm is better than one (on unlabeled data)
Best regard

dragoljub · July 2010

Clustering depends on the similarity measure you choose. If you know what it means for two samples to be similar almost any clustering method would give you good results. Its more about the similarity metric than the algorithm. Algorithms just have different methods overcome special cases where you may not know the number of clusters or size. Either way there is no easy answer to comparing two clustering methods. ;D

dan_agape · August 2010

Particular solutions may be found from case to case.

For instance if the data is numeric and tends to form centre based clusters (data visualisation may give you an indication), then the solutions based on the same number of clusters can obviously be compared using the so called squared error (i.e. the sum of squared distances from the data instances to the corresponding cluster centre - which is computed by averaging the column values in each cluster). Smaller squared error means better clustering. This method is used even for the application of the same algorithm that may lead to more than one solution (as the K-Means algorithm). The method may be partly extended for mixed (numeric and non numeric) data (in which case specific metrics replace the Euclidian distance, as for instance in the K-Medoids algorithm, that extends K-Means).

Another solution may be based on the idea of evaluating the result of an unsupervised clustering via supervised learning evaluation. You can cluster your data obtaining a new column - let us call it clusterNo. Then you learn a decision tree (or another model issued from a supervised learning) using clusterNo as your label/output attribute, and then you evaluate this model/tree. The accuracy of the model may give an indication of the quality of the clustering. Obviously, no method based on heuristics is perfect, but may be quite useful in practice.

Dan

dan_agape · August 2010

Just to add that k Nearest Neighbours may often be a good choice here as a supervised learning technique. The accuracy of its model would intuitively indicate here the likelihood that close instances were placed in the same cluster (which is expected in a good clustering).

Dan

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"unsupervised cluster evaluation"

Answers