technical question about the combined use of clustering and classification

BelleBelle Member Posts: 3 Newbie
Hi there! I'm a newbie to rapidminer and confronted a problem regarding the combined use of the clustering and classification.

Basically, I want to develop k-means clusters of my initial dataset and then further build models to perform the classification and evaluate their performance for EACH of the clusters. I know how to use the operators to perform cluster analysis and classification respectively but have no idea how to deploy the operators to combine them. I tried many ways such as placing the k-means operators before or within the cross-validation but still fail to either run it successfully or get the performance result of each cluster. Can anyone help?
Any response would be greatly appreciated :)

Thank you!

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @Belle,

    Are you using one of the performances operators dedicated to clustering (A priori the Cluster Distance Performance for k-Means) : 



    Regards,

    Lionel
  • BelleBelle Member Posts: 3 Newbie
    Hi @lionelderkrikor,

    Thank you for your replay :)
    And yeah, I tried "Cluster Distance Performance" in my process but found out it was just for evaluating the cluster (e.g. telling me the Davies-Bouldin index of the cluster) while the result I want is to see the performance (say, accuracy) in each cluster. Do I misunderstand those operators?

    Thanks! 
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    edited May 2020
    @Belle

    I think you have to Generate a "prediction attribute" from your clustering results to perform the correspondence between 
    the cluster(s) results and the classes of your label.

    EDIT : 
    I'm using the Iris Dataset. To be more precise on the methodology , I 'm clustering the different examples, and then label each cluster using the majority label of the labelled examples in that cluster.

    You can see what I mean by opening and running the process in attached file.

    Hope this helps,

    Regards,

    Lionel 
  • BelleBelle Member Posts: 3 Newbie
    Hi @lionelderkrikor,

    Big thanks for your explanation and example! :)

    But I came up with two questions regarding your provided process:

    1. In the training section of the cross-validation operator, it uses simply one clustering operator to train the model. I am wondering why we don't need to put any model for classification (e.g. decision tree or neural net) as the whole dataset contains the labelled attribute, which should thus be used as supervised learning? ( In my imagination, if I want to do classification in each of the clusters, I should have used both clustering operator and classification model?)

    2. In the testing section of the cross-validation operator, you use generate attribute to assign the label to each cluster. Does that mean that instead of assigning the label using the classification model, we should assign the label manually (where, I found some inconsistency, e.g. cluster 0 contains both Iris-versicolor & Iris-virginica, but you only assign the cluster 0 to Iris-versicolor?)?

    Thank you so much!

    Belle
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Take a look at Map Clusters to Labels operator.  It will do what you are looking for (I think) but you need to have the same number of classes in your label as you have clusters.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @Belle,

    To answer to your question : 
    Does that mean that instead of assigning the label using the classification model, we should assign the label manually 
    It is effectively what I tried to do manually/ "handcraft" in the process I shared in my previous post. This operation is performed automatically by the Map Clusters on Labels operator as said by @Telcontar120, but I was not aware of this operator.
    I can say in conclusion that I learn new things everyday on RapidMiner... ;)
    Thanks for sharing this operator, Brian ! 

    Regards,

    Lionel

Sign In or Register to comment.