Options

Merge dulicate IDs

g__g_g__g_ Member Posts: 1 Contributor I
edited October 2019 in Help
Hey there,

I have not found anything in the internet yet to solve my problem, so I'm trying it here.

I have a given Dataset, containing an ID attribute.
The problem now is that some examples use the same ID, for the reason they represent the same entity in reality.
However, not all of this examples contain the same values. For some of them are just missing, and some of them are completly different.

It looks something like this:
IDattr1attr2attr3attr4
1XX??A
1?YY?A
1??ZZC
2XX??B
2?YY?B
2??ZZD

I now want to merge the examples with the same ID into one example, so the table looks like this:
IDattr1attr2attr3attr4extr.attr4
1XXYYZZAC
2XXYYZZBD
The extra attribute should be generated for every new value that is occuring in the same attribute per example.
The missing values should just be filled up with given data.


What i need to know now is the right approach to solve this problem.
Which Operators are suited to solve this (in which order)?

I realy am thankful for every help, since none of my tries have brought me any closer to a solution.


What i thought about is a Loop for each Example and generate something like this, but this process would be huge and i have to check about 43k examples. Maybe there is a easy way to solve this i don't know about.


Tagged:

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,511 RM Data Scientist
    Hi,

    the step without the extra attribute is easy - it's aggregate.

    Do you only have nominal values? Then maybe aggregate with concat and a split afterwards does the job?

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.