Finding the most similar document(s) in a collection to a test document

crcowancrcowan Member Posts: 3 Contributor I
edited November 2018 in Help
While I was using version 4 of Rapid Miner I built a chain to perform this function. It is discussed here:
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.

With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.

The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).

Any recommendations?

Thank you.



  • Options
    radoneradone RapidMiner Certified Expert, Member Posts: 74 Guru
    Hello Charles,

    I was not deal with any similar problem, but my idea is to use entropy based representation (available in text mining extension) of documents and than for example usink k-NN you can check the similarity of the documents.

Sign In or Register to comment.