Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
keep both join attributes problem
johannesjungblu
Member Posts: 7 Contributor II
Hello guys, i hope you can help me, i am really struggling with this problem
Follwing two simple csv files
File a:
id;label;cont1
0; ; a
1; 1b; b
2; 2c; c
File b:
id;label;cont2
0; 0x; x
1; 1b; y
2; ; z
the desired result:
id;label;cont1;cont2
0; 0x; a; x
1; 1b; b; y
2; 2c; c; z
Thought way to do: outer join with "keep both join attributes" and no "remove double attributes"
But unfortunately i am getting always the same result, no matter the option "remove double attributes" is on or off:
id;label;cont1;cont2
0; ?; a; x
1; 1b; b; y
2; 2c; c; z
Anyone has an idea what i am doing wrong? Tanks!
Tagged:
0
Answers
There may be a way to do this with a single join operator, but I'm not sure. In the meantime, it can be definitely done by treating the two labels as separate variables (you'll need to name them differently) and then creating an attribute that combines them after the join. See the attached process for an example (the two csv files I created from the simple examples you gave).
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks for the working quick solution and example!
I am still hoping for a better solution since this only a small example. In my case I have 15 database tables, which i want to join with each 5 common attributes - which means a quite big subprocess for only doing an outer join with keeping both attributes - still thinks its an bug, cause in my opinion this is excact the case for which the setting was made...
My current solution is an outer join and creation of a new table in the database - but this can only work temporary.
Hi Johannes,
are you sure that all of the attributes you would like to "duplicate" have role regular? Since special roles need to be unique join has the feature to remove the 2nd attribute with the same role.
~Martin
Dortmund, Germany
Yes, i double checked it.
Dear Johannes,
i've built the process with the data you have written down above, and its working for me? Have a look at the attached process. Maybe there is something different in yours?
~Martin
Dortmund, Germany
Hey @mschmitz this is basically the same thing as the process I posted--it keeps the two "label" variables separate through the join and then created a combined label using "generate attributes" after that. But I think @johannesjungblu wants this to be done natively by the outer join operator without additional manipulation afterwards.
I don't think it is a bug per se, but I tend to agree with him that it is potentially confusing, because the operator help specifies that only the left attribute values are kept if there are double attributes, even if the join type is other than a left join. So perhaps this is an idea for a product enhancement that could be passed along to the development team to have it take the value of a double attribute based on the join type (left=left value, right=right value, outer=union value, etc.).
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi Brian,
to be honest, i do not think its a bug. I think its doing what it is supposed to do.
I think the desired behaviour for Johannes can be archived with a two operators. Loop Attributes + Generate Attributes. Then you can check for everything if it is missing and if so, take the right hand side.
~Martin
Dortmund, Germany
You are right!
I don´t get the option "keep both join attributes". The description implies my desired result:
keep both join attributes
Description: If checked, both columns of a join pair will be kept. Usually this is unneccessary since both attributes are identical. It may be useful to keep such a column if there are missing values on one side.
For what else is this option good?
Johannes,
it does what it supposed to do? If you check it, you get both attributes. ID and ID_from_left_ES. That's what its supposed to do, right?
~Martin
Dortmund, Germany
ahhhh sorry... I obviously missed that.
Is there a chance a "keep and combine attributes option" will be implemented in the near future?
can you elaborate on what you mean by keep and combine,
Not sure if you have seen the advanced options, click on show advanced options and it will allow you to select additonal column handling conditions
keep and combine should keep every pair of duplicate attributes in one atribute with a combined policy like: "take the not null entry".
If you mean the advanced options with remove and keep - yes.
hi...I would just rename the labels as label1 and label2 before the join. You're not joining on label anyway. Then you avoid this whole issue. Occam's Razor?
Scott
yes, i work with this solution, but it´s annoying rename and combine attriutes if you join 15 tables, if there is a possibility an option can make this...