The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Keep samples based on prefered attribute value

aileenzhouaileenzhou Member Posts: 12 Contributor II
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: B>C>A, and delete rest.

For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row     DOI                 Source
18        10.1002/67       A
1261    10.1002/67       B
1400    10.1002/67       C
... ...
643      10.102/et.67    C 
1428    10.102/et.67    A 

Thank you in advance.

Best Answer

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Since Remove Duplicate always keeps the first you can I think sort and then use remove duplicates on the DOI.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.