Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Help!!!!! Remove Duplicates
jmphillips
Member Posts: 18 Contributor II
Hello: The problem is that after using the remove duplicates operator, when writing those duplicates to an excel spreadsheet, lines that contain results that are not completely duplicates appear, since only the first word matches (Light blue mark)Â and some other cases that are not duplicates (yellow mark), Why it could be ?
What I need is that the match is with respect to the 8 words of each line so that they are considered as duplicates and not 1 or 2 or 3 words that match.
Tagged:
0
Answers
I expect that the duplicate port of the Remove Duplicates operator just contains the additional Examples (the duplicates), not the original ones. So I expect that the yellow and blue examples occurred exactly two times in the input data set. So one example went to the "exa" output port and the other to the "dup" output port. Thats probably true for the green ones as well. So I expect that the green example occured 29 times in the input data set.
To check for the number of occurences for specific values, you can use this little trick: