RapidMiner

Joining datasets on a non-distinct attribute

SOLVED
RM Certified Analyst
RM Certified Analyst

Joining datasets on a non-distinct attribute

Hi rapidminers, 

 

Let's say I need to JOIN 2 datasets, A and B, on an attribute 'key'. 

It is distinct for dataset A (each distinct value corresponds to only one example), but not for B (there are few examples with the same value of 'key' in B). 

 

What logic of joining will RapidMiner follow in this case? For each join 'A <--> key <--> B' what exactly example will be taken from B? First, last, random, or it totally depends on how dataset was stored in repo so it's not possible to know at all? 

 

Thank you. 

5 REPLIES
Community Manager Community Manager
Community Manager

Re: Joining datasets on a non-distinct attribute

So first it depends on which join you’re doing: inner, outer, left or right. You will most likely get different results depending on the data. Which one are you doing?
Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Certified Analyst
RM Certified Analyst

Re: Joining datasets on a non-distinct attribute

Hi @sgenzer

 

This one is actually an inner join (the most used one, though others might be occasionally present as well).

Community Manager Community Manager
Community Manager
Solution

Re: Joining datasets on a non-distinct attribute

So the short version is that it will REPEAT all of the id examples with the different B values:

 

Data Set 1             Data Set 2

ID    Last               ID     First  

1      JONES          2      Lynn

2      HOLMES       2      Lucile

3      SILVA            2       Leonardo

 

After inner join:

ID    Last                First

2      HOLMES       Lynn

2      HOLMES       Lucile

2      HOLMES       Leonardo

 

After outer join:

ID    Last                First

2      HOLMES       Lynn

2      HOLMES       Lucile

2      HOLMES       Leonardo

1      JONES          ?

3      SILVA             ?

 

 

left join is same as inner join in this case

 

After right join:

ID    Last                First

1      JONES          ?

2      HOLMES       Lynn

2      HOLMES       Lucile

2      HOLMES       Leonardo

3      SILVA             ?

 

Clear as mud?  Smiley Happy

 

Scott

 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Certified Analyst
RM Certified Analyst

Re: Joining datasets on a non-distinct attribute

Clear, Captain.

Thanks Smiley Happy 

RM Certified Expert
RM Certified Expert

Re: Joining datasets on a non-distinct attribute

Captain Morgan!