Apply Join operator on each Attribute

SylvainMSylvainM Member Posts: 18 Maven
edited July 2019 in Help

Hello everyone,

 

As usual, I'm sorry if the question has already been posted before - although I carefully checked...

 

Here is my issue. I have an Example Set_1 containing an attribute called Date, and a variable number n of attributes called People_1 to People_n. I also have another Example Set_2 containing a list of names and biographic datas.

 

In order to get for each date the biographic datas of each people involved that day, I would like to apply the Join Operator to each attribute People_i (i=1 to n) using their names as Key attributes

 

For now, as n is still small (7), I can do it by hand, parametring 7 Join Operators for each People attribute, and joining my 7 resulting sets in the Append Operator. But, I would like to get the same result automatically, in case of n becomes too big (parametring 40 Join Operators and connecting all them to an Append seems quite heavy and inelegant to me).

 

I've tried during past week to find an elegant solution to that problem, and I haven't been able to figure it. I've tried Loops of all kind, Subprocesses, etc. It seems very simple though, but I've been unable to get my "hand" results automatically.

 

Here are some of my datas to help you.

 

Thanks a lot, as always :-)

Sylvain

 

 

Row Id Date People_1 People_2 People_3 People_4 People_5 People_6 People_7
12 12.0 Sat Dec 19 00:00:00 EST 1970 Loretta Lynn Owen Bradley ? ? ? ? ?
13 13.0 Sat Jun 07 00:00:00 EDT 1980 Sissy Spacek Loretta Lynn Owen Bradley ? ? ? ?
14 14.0 Sat Dec 04 00:00:00 EST 2010 Loretta Lynn Sheryl Crow Miranda Lambert ? ? ? ?
15 15.0 Sat Aug 09 00:00:00 EDT 1969 Charley Pride Dallas Frazier A.L. "Doodle" Chet Atkins Jack Clement Bob Ferguson Felton Jarvis

 

 

Row People Birth_date Birth_state Sex Ethnicity
2 Charles Kelley Fri Sep 11 00:00:00 EDT 1981 GA Male Caucasian
3 Hillary Scott Tue Apr 01 00:00:00 EST 1986 TN Female Caucasian
4 Reba McEntire Mon Mar 28 00:00:00 EST 1955 OK Female Caucasian
5 Wanda Jackson Wed Oct 20 00:00:00 EST 1937 OK Female Caucasian
6 Carrie Underwood Thu Mar 10 00:00:00 EST 1983 OK Female Caucasian
7 Toby Keith Sat Jul 08 00:00:00 EDT 1961 OK Male Caucasian
8 David Bellamy Sat Sep 16 00:00:00 EDT 1950 FL Male Caucasian
9 Howard Bellamy Sat Feb 02 00:00:00 EST 1946 FL Male Caucasian
10 Keith Urban Thu Oct 26 00:00:00 EDT 1967 Northland Male Caucasian
11 Miranda Lambert Thu Nov 10 00:00:00 EST 1983 TX Female Caucasian
12 Sam Hunt Sat Dec 08 00:00:00 EST 1984 GA Male Caucasian
13 Johnny Cash Fri Feb 26 00:00:00 EST 1932 AR Male Caucasian
14 June Carter Sun Jun 23 00:00:00 EDT 1929 VA Female Caucasian
15 Merle Haggard Tue Apr 06 00:00:00 EST 1937 CA Male Caucasian
16 Waylon Jennings Tue Jun 15 00:00:00 EDT 1937 TX Male Caucasian
17 Willie Nelson Sat Apr 29 00:00:00 EST 1933 TX Male Caucasian
18 Loretta Lynn Thu Apr 14 00:00:00 EST 1932 KY Female Caucasian
19 Sissy Spacek Sun Dec 25 00:00:00 EST 1949 TX Female Caucasian
20 Sheryl Crow Sun Feb 11 00:00:00 EST 1962 MO Female Caucasian
21 Charley Pride Sun Mar 18 00:00:00 EST 1934 MS Male African American
22 Rodney Clawon ? TX Male Caucasian
23 Nathan Chapman ? TN Male Caucasian

Best Answer

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Solution Accepted

    @SylvainM

    I'm sure I'm missing something important, but have you tried the De-Pivot operator before the join?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.0.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
    <parameter key="generator_type" value="comma_separated_text"/>
    <list key="function_descriptions"/>
    <list key="numeric_series_configuration"/>
    <list key="date_series_configuration"/>
    <list key="date_series_configuration (interval)"/>
    <parameter key="input_csv_text" value="Row &#9;Id &#9;Date &#9;People_1 &#9;People_2 &#9;People_3 &#9;People_4 &#9;People_5 &#9;People_6 &#9;People_7&#10;12 &#9;12.0 &#9;Sat Dec 19 00:00:00 EST 1970 &#9;Loretta Lynn &#9;Owen Bradley &#9;? &#9;? &#9;? &#9;? &#9;?&#10;13 &#9;13.0 &#9;Sat Jun 07 00:00:00 EDT 1980 &#9;Sissy Spacek &#9;Loretta Lynn &#9;Owen Bradley &#9;? &#9;? &#9;? &#9;?&#10;14 &#9;14.0 &#9;Sat Dec 04 00:00:00 EST 2010 &#9;Loretta Lynn &#9;Sheryl Crow &#9;Miranda Lambert &#9;? &#9;? &#9;? &#9;?&#10;15 &#9;15.0 &#9;Sat Aug 09 00:00:00 EDT 1969 &#9;Charley Pride &#9;Dallas Frazier &#9;A.L. &quot;Doodle&quot; &#9;Chet Atkins &#9;Jack Clement &#9;Bob Ferguson &#9;Felton Jarvis"/>
    <parameter key="column_separator" value=" &#9;"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="8.2.000" expanded="true" height="82" name="De-Pivot" width="90" x="179" y="34">
    <list key="attribute_name">
    <parameter key="People" value="People_.*"/>
    </list>
    <parameter key="index_attribute" value="NewID"/>
    </operator>
    <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.0.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="187">
    <parameter key="generator_type" value="comma_separated_text"/>
    <list key="function_descriptions"/>
    <list key="numeric_series_configuration"/>
    <list key="date_series_configuration"/>
    <list key="date_series_configuration (interval)"/>
    <parameter key="input_csv_text" value="Row &#9;People &#9;Birth_date &#9;Birth_state &#9;Sex &#9;Ethnicity&#10;2 &#9;Charles Kelley &#9;Fri Sep 11 00:00:00 EDT 1981 &#9;GA &#9;Male &#9;Caucasian&#10;3 &#9;Hillary Scott &#9;Tue Apr 01 00:00:00 EST 1986 &#9;TN &#9;Female &#9;Caucasian&#10;4 &#9;Reba McEntire &#9;Mon Mar 28 00:00:00 EST 1955 &#9;OK &#9;Female &#9;Caucasian&#10;5 &#9;Wanda Jackson &#9;Wed Oct 20 00:00:00 EST 1937 &#9;OK &#9;Female &#9;Caucasian&#10;6 &#9;Carrie Underwood &#9;Thu Mar 10 00:00:00 EST 1983 &#9;OK &#9;Female &#9;Caucasian&#10;7 &#9;Toby Keith &#9;Sat Jul 08 00:00:00 EDT 1961 &#9;OK &#9;Male &#9;Caucasian&#10;8 &#9;David Bellamy &#9;Sat Sep 16 00:00:00 EDT 1950 &#9;FL &#9;Male &#9;Caucasian&#10;9 &#9;Howard Bellamy &#9;Sat Feb 02 00:00:00 EST 1946 &#9;FL &#9;Male &#9;Caucasian&#10;10 &#9;Keith Urban &#9;Thu Oct 26 00:00:00 EDT 1967 &#9;Northland &#9;Male &#9;Caucasian&#10;11 &#9;Miranda Lambert &#9;Thu Nov 10 00:00:00 EST 1983 &#9;TX &#9;Female &#9;Caucasian&#10;12 &#9;Sam Hunt &#9;Sat Dec 08 00:00:00 EST 1984 &#9;GA &#9;Male &#9;Caucasian&#10;13 &#9;Johnny Cash &#9;Fri Feb 26 00:00:00 EST 1932 &#9;AR &#9;Male &#9;Caucasian&#10;14 &#9;June Carter &#9;Sun Jun 23 00:00:00 EDT 1929 &#9;VA &#9;Female &#9;Caucasian&#10;15 &#9;Merle Haggard &#9;Tue Apr 06 00:00:00 EST 1937 &#9;CA &#9;Male &#9;Caucasian&#10;16 &#9;Waylon Jennings &#9;Tue Jun 15 00:00:00 EDT 1937 &#9;TX &#9;Male &#9;Caucasian&#10;17 &#9;Willie Nelson &#9;Sat Apr 29 00:00:00 EST 1933 &#9;TX &#9;Male &#9;Caucasian&#10;18 &#9;Loretta Lynn &#9;Thu Apr 14 00:00:00 EST 1932 &#9;KY &#9;Female &#9;Caucasian&#10;19 &#9;Sissy Spacek &#9;Sun Dec 25 00:00:00 EST 1949 &#9;TX &#9;Female &#9;Caucasian&#10;20 &#9;Sheryl Crow &#9;Sun Feb 11 00:00:00 EST 1962 &#9;MO &#9;Female &#9;Caucasian&#10;21 &#9;Charley Pride &#9;Sun Mar 18 00:00:00 EST 1934 &#9;MS &#9;Male &#9;African American&#10;22 &#9;Rodney Clawon &#9;? &#9;TX &#9;Male &#9;Caucasian&#10;23 &#9;Nathan Chapman &#9;? &#9;TN &#9;Male &#9;Caucasian"/>
    <parameter key="column_separator" value=" &#9;"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="380" y="85">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People" value="People"/>
    </list>
    </operator>
    <connect from_op="Create ExampleSet" from_port="output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

     

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @SylvainM,

     

    Based on the 2 examplesets of your post, could you post a resulting exampleset of what you want to obtain, to better understand.

     

    Regards,

     

    Lionel

  • SylvainMSylvainM Member Posts: 18 Maven

    Hello Lionel,

     

    Thank you for your help :-)

     

    This is what I get "by hand" (please, consider that my bio datas aren't complete yet, some names are missing, which explains why I don't have a row for each person indicated by my previous Example sets)

     

    Sat Dec 19 00:00:00 EST 1970 KY Female Caucasian
    Sat Jun 07 00:00:00 EDT 1980 TX Female Caucasian
    Sat Jun 07 00:00:00 EDT 1980 KY Female Caucasian
    Sat Dec 04 00:00:00 EST 2010 KY Female Caucasian
    Sat Dec 04 00:00:00 EST 2010 MO Female Caucasian
    Sat Dec 04 00:00:00 EST 2010 TX Female Caucasian
    Sat Aug 09 00:00:00 EDT 1969 MS Male African American

     

     

    Best,

    Sylvain

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @SylvainM

     

    Could you share your process ?

     

    Regards,

     

    Lionel

  • SylvainMSylvainM Member Posts: 18 Maven

    Hi Lionel,

     

    So I don't know if I did it correctly (I'm very bad with computers), but this is the "joining" part of the process. It takes as inputs the intermediary results I gave in my previous post.

     

    Thanks so much!

    Sylvain

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="103" name="Subprocess" width="90" x="246" y="34">
    <process expanded="true">
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="208" name="Multiply (5)" width="90" x="45" y="34"/>
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="208" name="Multiply (6)" width="90" x="45" y="340"/>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (14)" width="90" x="179" y="646">
    <parameter key="remove_double_attributes" value="true"/>
    <parameter key="join_type" value="inner"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People_7" value="People"/>
    </list>
    <parameter key="keep_both_join_attributes" value="false"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (13)" width="90" x="179" y="544">
    <parameter key="remove_double_attributes" value="true"/>
    <parameter key="join_type" value="inner"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People_6" value="People"/>
    </list>
    <parameter key="keep_both_join_attributes" value="false"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (12)" width="90" x="179" y="442">
    <parameter key="remove_double_attributes" value="true"/>
    <parameter key="join_type" value="inner"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People_5" value="People"/>
    </list>
    <parameter key="keep_both_join_attributes" value="false"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (11)" width="90" x="179" y="340">
    <parameter key="remove_double_attributes" value="true"/>
    <parameter key="join_type" value="inner"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People_4" value="People"/>
    </list>
    <parameter key="keep_both_join_attributes" value="false"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (10)" width="90" x="179" y="238">
    <parameter key="remove_double_attributes" value="true"/>
    <parameter key="join_type" value="inner"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People_3" value="People"/>
    </list>
    <parameter key="keep_both_join_attributes" value="false"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (8)" width="90" x="179" y="34">
    <parameter key="remove_double_attributes" value="true"/>
    <parameter key="join_type" value="inner"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People_1" value="People"/>
    </list>
    <parameter key="keep_both_join_attributes" value="false"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (9)" width="90" x="179" y="136">
    <parameter key="remove_double_attributes" value="true"/>
    <parameter key="join_type" value="inner"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="People_2" value="People"/>
    </list>
    <parameter key="keep_both_join_attributes" value="false"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.2.000" expanded="true" height="208" name="Append (2)" width="90" x="514" y="136">
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    <parameter key="merge_type" value="all"/>
    </operator>
    <connect from_port="in 1" to_op="Multiply (5)" to_port="input"/>
    <connect from_port="in 2" to_op="Multiply (6)" to_port="input"/>
    <connect from_op="Multiply (5)" from_port="output 1" to_op="Join (8)" to_port="left"/>
    <connect from_op="Multiply (5)" from_port="output 2" to_op="Join (9)" to_port="left"/>
    <connect from_op="Multiply (5)" from_port="output 3" to_op="Join (10)" to_port="left"/>
    <connect from_op="Multiply (5)" from_port="output 4" to_op="Join (11)" to_port="left"/>
    <connect from_op="Multiply (5)" from_port="output 5" to_op="Join (12)" to_port="left"/>
    <connect from_op="Multiply (5)" from_port="output 6" to_op="Join (13)" to_port="left"/>
    <connect from_op="Multiply (5)" from_port="output 7" to_op="Join (14)" to_port="left"/>
    <connect from_op="Multiply (6)" from_port="output 1" to_op="Join (8)" to_port="right"/>
    <connect from_op="Multiply (6)" from_port="output 2" to_op="Join (9)" to_port="right"/>
    <connect from_op="Multiply (6)" from_port="output 3" to_op="Join (10)" to_port="right"/>
    <connect from_op="Multiply (6)" from_port="output 4" to_op="Join (11)" to_port="right"/>
    <connect from_op="Multiply (6)" from_port="output 5" to_op="Join (12)" to_port="right"/>
    <connect from_op="Multiply (6)" from_port="output 6" to_op="Join (13)" to_port="right"/>
    <connect from_op="Multiply (6)" from_port="output 7" to_op="Join (14)" to_port="right"/>
    <connect from_op="Join (14)" from_port="join" to_op="Append (2)" to_port="example set 7"/>
    <connect from_op="Join (13)" from_port="join" to_op="Append (2)" to_port="example set 6"/>
    <connect from_op="Join (12)" from_port="join" to_op="Append (2)" to_port="example set 5"/>
    <connect from_op="Join (11)" from_port="join" to_op="Append (2)" to_port="example set 4"/>
    <connect from_op="Join (10)" from_port="join" to_op="Append (2)" to_port="example set 3"/>
    <connect from_op="Join (8)" from_port="join" to_op="Append (2)" to_port="example set 1"/>
    <connect from_op="Join (9)" from_port="join" to_op="Append (2)" to_port="example set 2"/>
    <connect from_op="Append (2)" from_port="merged set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="source_in 3" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (7)" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="Birth State|ETHNICITY|Peak|SEX"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    </operator>
    <connect from_op="Subprocess" from_port="out 1" to_op="Select Attributes (7)" to_port="example set input"/>
    <connect from_op="Select Attributes (7)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

  • SylvainMSylvainM Member Posts: 18 Maven

    And I know : it's super heavy and ugly :smileylol:

  • SylvainMSylvainM Member Posts: 18 Maven

    :smileysurprised: That's exactly that I was looking for! I had no idea that I could use De-pivot like that. Sorry for being such a beginner...  :smileyembarrassed:

     

    Thanks so much, JEdward! I deeply appreciate your help! 

     

    And the problem is solved, of course.

Sign In or Register to comment.