Options

prevent duplicate matching on non-unique reference

JoeydJoeyd Member Posts: 2 Contributor I
edited November 2018 in Help
All,

I'm new to RapidMiner and the forum so apologies if I ask the obvious in the wrong place - I've looked around on the forum and on the net and could not find what I'm looking for.

The issue I have is this.
We send out files to one of our suppliers which contain references that are not unique. A reference may appear 1 to 4 times. Say for simplicity we send out a file like this:
ref1;somedata
ref1;somedata
ref2:somedata
ref2:somedata
Our supplier does his thing and sends his reply:
ref1:someresult
ref1:someresult
ref2:someresult
ref2:someresult
Basically what happens here is we send a transaction in twice, gets processed twice by the supplier, and gets reported twice by the supplier.

I would now like to link the response to the request. I cannot use a join, it will result in 8 output records. I cannot simply remove the duplicates of these 8, since some duplication is correct. So basically, I want to link one record in the input file to one record in the output file. As long as both ref1 records coming in are linked to both ref1 records going out I'm happy, doesn't matter which links to which.

Any idea how I can set this up in RapidMiner?

Regards,

Joe
Sign In or Register to comment.