RapidMiner

How to compare similarity of large number of documents

Community Manager

Re: How to compare similarity of large number of documents

Hello @roberto_r_herma - so process time varies a lot depending on many factors including your machine, the size and scope of the documents, etc...  One thing that I can definitely tell you is that RapidMiner loves RAM and multiple core processors.  FWIW, I just upgraded to 64GB of RAM with my 6-core Intel Xeon E5 to keep things humming along.

 

If I were you, I'd use the Sample operator and grab a small sample of your documents first.  Benchmark the sample and then gently increase so you can get a sense if the full 4100 docs is going to take 2 days or 2 years.  Smiley Happy

 

Scott

Contributor

Re: How to compare similarity of large number of documents

Thanks for the tip!