The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
Combining Data Sets
Hi, I am new at rapidminer and i am wondering if there is a solution to the following question
Is there a possibility to match the following Data Set1 and Set2 with the logic, that the Needed Data Set can be obtained?
- "?" in Set2 should be replaced by the corresponding values of Set1
- if there is a unambiguous assignment such as in row 2, only the corresponding value should be added
- if there is a ambiguous assignment such as in row 1 and 3, all possible values should be added
Data Set1:
Attr_A Attr_B Attr_C
A_111 B_123 C_999
A_111 B_234 C_999
A_111 B_345 C_222
A_222 B_456 C_111
A_222 B_567 C_222
Data Set2:
Attr_1 Attr_A Attr_B Attr_C
... A_111 ? ?
... A_111 B_234 ?
... A_222 ? ?
Data Set Needed:
Attr_1 Attr_A Attr_B Attr_C
... A_111 B_123, B_234, B_345 C_999, C_222
... A_111 B_234 C_999
... A_222 B_456, B_567 C_111, C_222
Is there a possibility to match the following Data Set1 and Set2 with the logic, that the Needed Data Set can be obtained?
- "?" in Set2 should be replaced by the corresponding values of Set1
- if there is a unambiguous assignment such as in row 2, only the corresponding value should be added
- if there is a ambiguous assignment such as in row 1 and 3, all possible values should be added
Data Set1:
Attr_A Attr_B Attr_C
A_111 B_123 C_999
A_111 B_234 C_999
A_111 B_345 C_222
A_222 B_456 C_111
A_222 B_567 C_222
Data Set2:
Attr_1 Attr_A Attr_B Attr_C
... A_111 ? ?
... A_111 B_234 ?
... A_222 ? ?
Data Set Needed:
Attr_1 Attr_A Attr_B Attr_C
... A_111 B_123, B_234, B_345 C_999, C_222
... A_111 B_234 C_999
... A_222 B_456, B_567 C_111, C_222
0
Answers
Dortmund, Germany
Data Set1 is some kind of a "Relation Table", where the relations between article numbers are shown
(e.g. A_111 is related to B_123 and C_999, but there is also a relation between A_111 , B_234, C_999 and so on).
Therefore Data Set1 won't have the same size as Data Set2. The Append Operator won't work in this case.
Maybe I am wrong with the joins operator group in general for this task. But so far I have no solution how Data Set2 could "make use" of Data Set1 to fill the missing cells in respect of the relation on the article numbers as they are shown in Data Set1.
Thanks in advance for help!
Dortmund, Germany
Unfortunatelly your suggested solution does not work in my case.
I try again to explain my question again, maybe that helps:
Data Set2 should use Data Set1 (as a relation table) to fill the missing values in Data Set2/ or create the new Data Set Needed.
That means e.g. that Data Set2 should "look up" in Data Set1 and recognize that value A_111 is related to B_123, B_234, B_345. So the missing value in Attr. B of the Needed Data Set should be "B_123, B_234, B_345". (The same question with Att C for the Needed Data Set).
As a further aspect Data Set2 should also "understand" if it contains already the values A_111 and B_234, that those two values in combination have only a relation with C_999 (see relation table/ Data Set1).
I hope the explanation helps to understand my objective. Sorry for the long explanation.
Thanks again!
Of course, you may want to take a step back and ask whether you actually need this data structure, or whether another structure would be suitable. If you are doing anything with machine learning, this is going to be an awkward structure to use. join based on attribute A. You have a couple of alternatives:
Generally speaking having multiple distinct values in a single db cell is harder for computers to deal with.
Also, you said that Martin's approach would not work, but I am not sure why you said that, did you actually try it out?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts