Problem with hierarchical clustering

elena20elena20 Member Posts: 14 Contributor I
edited September 2019 in Help

hello. I used the prossecc document from data and tf-idf
  I used the top down clustering and agglomerative clustering operator
How do I optimize the number of clusters?
And how do I evaluate them?
Can I use performance distance clustering?
Please, tutors
Thankful

Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi @elena20,

    please have a look at the operator "Flatten Clustering". This reduces the hierachy to n-leaves. Afterwards you can go forward with usual cluster performance measures.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • elena20elena20 Member Posts: 14 Contributor I

    Thank you very much
    But
    How can I evaluate hierarchical paraphernalia? Do you send a sample without wounding?
    Thank you

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I don't understand your last question at all, but you can use any standard clustering performance metric, such as DB index.  However, since clustering is unsupervised, I would say your own use case should guide your evaluation at least as much as any formal metric.  What are you clustering and for what purpose?  Based on that purpose, how many clusters is reasonable versus too many?  Etc.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • elena20elena20 Member Posts: 14 Contributor I

    Hello
    So much
    I want to do a hierarchical clustering on Twitter. And then compare with kmeans clustering. Is he honey
    Which operator to evaluate hierarchical results?
    Performance clustering distance operator error
    Thankful

Sign In or Register to comment.