Text mining excel using clustering to obtain confusion matrix

brunonbrasilbrunonbrasil Member Posts: 8 Contributor II
edited January 2020 in Help
I'm new to rapidminer, I need to get the confusion matrix to validate clusters obtained from a text. Did you know how to do this?

Best Answer

  • brunonbrasilbrunonbrasil Member Posts: 8 Contributor II
    Solution Accepted
    I think the solution is this:



  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Clusters are a form of unsupervised machine learning so it is not possible to generate a confusion matrix directly from clustering.  You would first need to turn the clusters into a label and then have another process to assign the clusters to compare the two outputs.  Or if you already have another existing label with the same number of categories as clusters, then you can use the Map Clusters on Labels operator to do this automatically and then use a normal Performance operator to generate the confusion matrix.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • brunonbrasilbrunonbrasil Member Posts: 8 Contributor II
    I built this model to classify the confusion matrix. I managed to get the confusion matrix but I don't know if it is the correct form. Does it make sense to you?

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @brunonbrasil so based on that screenshot you are using a very old version of RapidMiner Studio. I would highly recommend updating to the most recent version (9.5.1).

    Jasmine_[Deleted User]brunonbrasil
  • brunonbrasilbrunonbrasil Member Posts: 8 Contributor II
    The context I consider as a label, means the clusters that I obtain manually and compare with the clusters that I intend to obtain. The Receiver represents the data in sentences.

Sign In or Register to comment.