"Top Down Clustering - determining Item Number of lower Level Clusters"

tiramisusann · October 2013

I'm using Rapid Miner in order to complete the task of my Master Thesis. For that I have to cluster a huge amount of textual data with the goal to identify the most similar document of the database to an incoming piece of document.

For that I need to define a top down clustering. In the lowest level it should contain clusters with only ONE document (otherwise it would be not possible to find the most similar document). The incoming document should follow the path which it is most similar to by comparing the centroid vectors of the clusters with the document vector. Applying that algorithm it will terminate at the cluster containing the most similar document.

But how could I implement that idea in Rapid Miner? I have no clue how to tell Rapid Miner, that Clusters of the last and lowest level only should contain one single document.

I would be very very grateful, if anyone could help.

Thanks so much,

tiramisusann

MariusHelf · October 2013

The Top Down Clustering operator with k-Means in the subprocess does this job for you. This probably also restrains you from implementing the solution to your previous question

Happy Mining!
~Marius

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Top Down Clustering - determining Item Number of lower Level Clusters"

Answers