Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Measuring clustering quality for a previously clustered data
singing_bird_1
Member Posts: 16 Contributor I
Hi all, I am new in rapidminer and I have a clustered data that has been clustered previously and I want to load this data with its lable to rapidminer to be evaluated using one of the clustering evaluation measures
Note: I don't want to recluster my data, I want to evaluate it as it is with its lables.
How can I do this?
Thanks in advance
Tagged:
0
Answers
Edit: I misread your question. Would you be able to post your data, or parts of it? Measuring the performance should be straightforward, as long as labels and relevant attributes are available.
thanks for you reply
attached is a part of the data and its clusters
they are 3 clusters
the problem is that the performance (SSE) requires the data (which is not the problem) and requires the centroid which is unknown, because it is already labeled.
silhouette requires the data , the model or the centroid as well as the similarity measure
how can i arrange the nodes in the process to get the quality of the given data? and which nodes should i use?
Ok, I don't think you can make any meaningful performance evaluations like this, because the data is missing information (e.g. the cluster model). What would you like to achieve? I.e. what is the question about the clusters that you would like to have answered?
my question is how to achieve the clustering quality despite the missed info as cluster model and distance measure in silhouette?
if the answer is : it is impossible to achieve the clustering quality here in rapidminer because of the missing info, so give me a way to measure the clustering quality via another program or give me the SSE and the silhouette code