Finding the most similar document(s) in a collection to a test document

crcowan · July 2010

While I was using version 4 of Rapid Miner I built a chain to perform this function. It is discussed here:
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.

With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.

The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).

Any recommendations?

Thank you.

Charles

radone · July 2010

Hello Charles,

I was not deal with any similar problem, but my idea is to use entropy based representation (available in text mining extension) of documents and than for example usink k-NN you can check the similarity of the documents.

Regards
radone

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Finding the most similar document(s) in a collection to a test document

Answers