Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Join / Append /Merge Multiple TD-IDF Example Sets or recompute ?
I'm trying to compare documents from 2 datasets with the data to similarity operator but I'm not sure how to join/merge/append the data sets which contain the TF-IDF results for each word
I can't join because there isn't a common ID
I can't append because there are different tokens in each dataset but I expect there to be some common ones as well
There are also different attribute counts in each dataset (20,000 attributes plus in each example set)
The datasets required different pre-processing to end up with TD-IDF so can I really recompute TD-IDF if I can figure out how to merge the original datasets into 1 before calculating the TD-IDF?
I can't join because there isn't a common ID
I can't append because there are different tokens in each dataset but I expect there to be some common ones as well
There are also different attribute counts in each dataset (20,000 attributes plus in each example set)
The datasets required different pre-processing to end up with TD-IDF so can I really recompute TD-IDF if I can figure out how to merge the original datasets into 1 before calculating the TD-IDF?
Tagged:
0
Answers
have you tried to use cross distances instead of data to similarity?
~martin
Dortmund, Germany
I think about something like this:
Dortmund, Germany
Dortmund, Germany