RAPIDMINER 9.7 BETA ANNOUNCEMENT
The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!
Similarity between data
I have a data set that contains about 250.000 products, consisting of various columns like "artid", "title", "longtext" and so on.
Now, I want to find similar products to each product, where the result should look like:
artid; similar1-artid; simliar2-artid; and so on.
For this, I'd like to select the columns that should be analyzed and I'd like to set a "limit of similarity" that tells rapidminer when to write the artid of a similar product into the results list (next to each product) and when to ignore it.
I had a look on many video tutorials, dealing with text classifcation but none of them told me on how to create such a dataset. (listing each product again together with the artid of the similiar products)
I also tried "data to similarity" but it fails to display the results, even if I filter for 1 % of the data.
Does anyone have an idea on that?
Many thanks in advance!