Re: How to compare similarity of large number of documents
Hello @roberto_r_herma - so process time varies a lot depending on many factors including your machine, the size and scope of the documents, etc... One thing that I can definitely tell you is that RapidMiner loves RAM and multiple core processors. FWIW, I just upgraded to 64GB of RAM with my 6-core Intel Xeon E5 to keep things humming along.
If I were you, I'd use the Sample operator and grab a small sample of your documents first. Benchmark the sample and then gently increase so you can get a sense if the full 4100 docs is going to take 2 days or 2 years.
Scott Genzer Senior Community Manager RapidMiner, Inc.