If you input 1200 examples to the data to similarity operator you will get 1200*1199 pairs - 1.4 million rows - so you're probably getting memory issues. My suggestion is to use the similarity to data operator to turn the similarity result back into an example set and see if this displays more efficiently. If not, I would write the result to the repository, a database or a file and I would disconnect the result from the output so that it does not display at all.
You can then read the result later and use the filter or sample operators to extract the bits you're interested in.
Hi, I found this entry because I faced the same issue. It takes forever to get the output of cosine similiarity analysis out of 4100 documents. I followed some of the suggestions above and my flow is:
Read CSV--> Process documents from Data-->Data to similarity--> Similarity to Data--> Write Excel
After 24 hours it is still in the "Similarity to Data" process.
Any one has an idea how much time this will take? My PC characteristics are as follow:
Windows 10 entreprise Version 1607, 64 bit
Processor Intel Core i5-4310U
CPU 2,60 GHZ
Thanks for any tip