RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.


Hierarchical Clustering - Extract Distances/Clusters for each step

btibertbtibert Member, University Professor Posts: 79  Guru
edited October 2019 in Help
Is it possible to extract the height/distance added at each step of a hierarchical clustering step?  Ideal state is the number of clusters and height/distance added at each step.  It would be helpful to be able to extract this, somehow, as an ExampleSet to explore where the cut should be relative to the balance distance/clusters.

It is 100% possible that these tools exist, but they are not jumping out to me.  When combining the Agglomerative Clustering and Flatten Cluster operator, I am not sure where I can extract information to make a data-driven decision on how many clusters to extract.

Best Answer


  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,926  Community Manager
  • btibertbtibert Member, University Professor Posts: 79  Guru
    Thanks everyone.  Short of fitting the model and associating the clusters in this context, any other suggestions on how to "review" or "validate" cluster assignments? Validating KMeans is straight forward, but wondering about hierarchical methods in a teaching context.
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,462  RM Data Scientist
    https://towardsdatascience.com/understanding-clustering-cf0117148ef4#b7ae is what i like to do, and this is fairly method-independend.

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • btibertbtibert Member, University Professor Posts: 79  Guru
    Thanks, I love showing my students how to profile with Trees (something that we already covered), but I was hoping that there was something I could show them around how to select the cut in a more data-driven way.  I can totally make it work with smaller datasets to make the intuition easier to grasp, but I just wanted to make sure I wasn't missing any operators along the way.
Sign In or Register to comment.