How to remove non-duplicate values?

MarlaBotMarlaBot Administrator, Moderator, Employee, Member Posts: 57 Community Manager
edited March 2019 in Help
A RapidMiner user wants to know the answer to this question: "Hey! I have a data set of over 42000 records that has several duplicate and unique values. However, I would like to clean it up and remove only non-duplicate values and leave duplicate records. I know the “remove duplicates” operator removes duplicates but in my case, I want to do the opposite. Any idea how I could do this? Thank you."


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,518 RM Data Scientist
    cant you just join the duplicates on the original data? Than you have only duplicates remaining.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @MarlaBot so the Remove Duplicates operator has both options:

    Does this help? :smile:

  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    You have 42000 records.

    Some are duplicate.
    Some are unique.

    If you need the non-uniques, the dup output from the Remove Duplicates operator obtains the records that aren't unique.

    Sorry, I was lost in translation, had to reorganize the question because I understood like, 3 different things. Yes, @sgenzer's question is fine. If what is required is an aggregation (like, the count of duplicated events), what @mschmitz says helps, too.
  • Options
    novice_minernovice_miner Member Posts: 3 Contributor I
    Thanks for all your help. It worked like magic. 

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I think this is the same question as in this thread, where I provided a similar answer:  https://community.rapidminer.com/discussion/comment/57000#Comment_57000
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.