**RapidMiner 9.7 is Now Available**

### Lots of amazing new improvements including true version control! Learn more about what's new here.

### CLICK HERE TO DOWNLOAD

# "unsupervised cluster evaluation"

nguyenxuanhau
Member Posts:

**22**Maven
Hi!

can I compare unsupervised cluster evaluations(clustermodel evaluation) each other on unlabeled data on RM?

what must I do to compare unsupervised cluster evaluations (clustermodel evaluation)each other on unlabeled data in RM?

Best regard

can I compare unsupervised cluster evaluations(clustermodel evaluation) each other on unlabeled data on RM?

what must I do to compare unsupervised cluster evaluations (clustermodel evaluation)each other on unlabeled data in RM?

Best regard

Tagged:

0

## Answers

2,531Unicornif you want to compare how much two cluster outcomes match each other, you can simply rename the first one and assign it the role "label" before actually performing the second evaluation. If you then would set the role of the second cluster attribute to prediction, you can use standard accuracy measure to measure the equality.

Greetings,

Sebastian

22MavenPlease detail do that, how do the method chose the best cluster on my data ( my data is large but unlabled)?

Bestregard

2,531Unicornit doesn't. How could you know what is the best cluster? Guessing?

There are some cluster evaluation heuristics available, but as their name says: They are just heuristics.

Greetings,

Sebastian

22MavenBest regard

241Maven106MavenFor instance if the data is numeric and tends to form centre based clusters (data visualisation may give you an indication), then the solutions based on the same number of clusters can obviously be compared using the so called squared error (i.e. the sum of squared distances from the data instances to the corresponding cluster centre - which is computed by averaging the column values in each cluster). Smaller squared error means better clustering. This method is used even for the application of the same algorithm that may lead to more than one solution (as the K-Means algorithm). The method may be partly extended for mixed (numeric and non numeric) data (in which case specific metrics replace the Euclidian distance, as for instance in the K-Medoids algorithm, that extends K-Means).

Another solution may be based on the idea of evaluating the result of an unsupervised clustering via supervised learning evaluation. You can cluster your data obtaining a new column - let us call it clusterNo. Then you learn a decision tree (or another model issued from a supervised learning) using clusterNo as your label/output attribute, and then you evaluate this model/tree. The accuracy of the model may give an indication of the quality of the clustering. Obviously, no method based on heuristics is perfect, but may be quite useful in practice.

Dan

106MavenDan