Merge dulicate IDs

g__g_ · April 2016

Hey there,

I have not found anything in the internet yet to solve my problem, so I'm trying it here.

I have a given Dataset, containing an ID attribute.
The problem now is that some examples use the same ID, for the reason they represent the same entity in reality.
However, not all of this examples contain the same values. For some of them are just missing, and some of them are completly different.

It looks something like this:

ID	attr1	attr2	attr3	attr4
1	XX	?	?	A
1	?	YY	?	A
1	?	?	ZZ	C
2	XX	?	?	B
2	?	YY	?	B
2	?	?	ZZ	D

I now want to merge the examples with the same ID into one example, so the table looks like this:

ID	attr1	attr2	attr3	attr4	extr.attr4
1	XX	YY	ZZ	A	C
2	XX	YY	ZZ	B	D

The extra attribute should be generated for every new value that is occuring in the same attribute per example.
The missing values should just be filled up with given data.

What i need to know now is the right approach to solve this problem.
Which Operators are suited to solve this (in which order)?

I realy am thankful for every help, since none of my tries have brought me any closer to a solution.

What i thought about is a Loop for each Example and generate something like this, but this process would be huge and i have to check about 43k examples. Maybe there is a easy way to solve this i don't know about.

MartinLiebig · April 2016

Hi,

the step without the extra attribute is easy - it's aggregate.

Do you only have nominal values? Then maybe aggregate with concat and a split afterwards does the job?

~Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Merge dulicate IDs

Answers