"Match a single document to the closest one of a large number of documents"

martin_switzerl · May 2014

Dear all

I'm new to this forum and also rather new to RI and text mining. After learning the basics (text analytics, clustering, classifications, sentiment analysis, ...), I wanted to do the following project:

Compare a specific document with a list of hundreds of documents and give me the closest match (and tell me the differences).

Or, to be a bit more precise:
"Give me possible cooking recipes based on certain items in a 'shopping basket' and tell me what i still miss to cook that recipe." The idea would be to have a document containing e.g. "chicken, tomatoes, onions" and RI would then give me possible receipts out of my recipe list that need these ingredients. Ideally, RI would also tell me, which ingredients I need in addition.

So far the plan. I downloaded a couple of hundred receipts from a website and extracted separately "ingredients", "name" and "recipe" with the "extract content" function and saved all into an excel file, which finally worked pretty well. I am stuck where to go from here and already read through a lot of tutorials without really finding an answer: I was first thinking of X-Validation, but considering that I want to compare only a single document (the "shopping basket") with several hundred receipts, it does not seem to be the right way. I was also thinking of doing something with "data to similarity", but did not yet succeed.

Could you please give me some hints? Thanks a lot!

Best from Switzerland,
Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Match a single document to the closest one of a large number of documents"