🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
Text Mining - Documents Similarity (words position)
I'm looking for a way to get the similarity between documents, but where the words positions is relevant.
I've already implemented the sample with "Data Similarity" operator (CosineSimilarity) like:
But I need to take into account the order/position of words, not only frecuency or occurrence.
Example 1: A B C D E F G
Example 2: A X B D Y F G
Example 3: G F E A B C D
Example 1 and 2 have more similarity than Example 1 and 3 because although Example 3 has exactly the same words than Example 1 (CosineSimilarity=1), they are in different position. Example 2 only has two different words (X,Y), and other word in other position but near the original position...
I think is a problem difficult to explain and I'm not sure if RapidMiner can give me a solution.