Keep samples based on prefered attribute value

aileenzhouaileenzhou Member Posts: 12 Contributor II
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: B>C>A, and delete rest.

For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row     DOI                 Source
18        10.1002/67       A
1261    10.1002/67       B
1400    10.1002/67       C
... ...
643      10.102/et.67    C 
1428    10.102/et.67    A 

Thank you in advance.

Best Answer


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Since Remove Duplicate always keeps the first you can I think sort and then use remove duplicates on the DOI.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.