Similarity accuracy due to other column
How can I do the following using RapidMiner Studio?
I have a dataset with several columns (all text). Each row of the dataset must be compared to the other rows of the same dataset, and I need the similarity between some of the texts fields in the dataset. One of these columns of the dataset (column x) is the information I'm trying to "predict" through text similarity. That is, I know that if rows 1 and 2 of my dataset are very similar, they should share the same information as column x. And if columns 1 and 3 of the dataset are not similar, they should have different information in column x. How can I, then:
- once I get the similarity score, relate this similarity to column x content?
- get accuracy (and other metrics) of this relation?
Thank you very very much!