Options

"Top Down Clustering - determining Item Number of lower Level Clusters"

tiramisusanntiramisusann Member Posts: 9 Contributor II
edited June 2019 in Help

I'm using Rapid Miner in order to complete the task of my Master Thesis. For that I have to cluster a huge amount of textual data with the goal to identify the most similar document of the database to an incoming piece of document.

For that I need to define a top down clustering. In the lowest level it should contain clusters with only ONE document (otherwise it would be not possible to find the most similar document). The incoming document should follow the path which it is most similar to by comparing the centroid vectors of the clusters with the document vector. Applying that algorithm it will terminate at the cluster containing the most similar document.

But how could I implement that idea in Rapid Miner? I have no clue how to tell Rapid Miner, that Clusters of the last and lowest level only should contain one single document.

I would be very very grateful, if anyone could help.

Thanks so much,

tiramisusann
Tagged:

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    The Top Down Clustering operator with k-Means in the subprocess does this job for you. This probably also restrains you from implementing the solution to your previous question :)

    Happy Mining!
    ~Marius
Sign In or Register to comment.