Options

"Join

MuehliManMuehliMan Member Posts: 85 Maven
edited June 2019 in Help
Dear RapidMiner community,

I have two tables (imported from Excel files), both containing one ID column and various properties. In addition to that for some examples there are names in table A and/or B. I joined both tables using the JOIN operator with an outer join to get a table containing all information. I did not check the remove duplicate attributes checkbox. So I am ending up with the columns names_from_A and names_from_B.

For some examples table A has a value for name, for some other table B has a value. In some cases there are names in table A and B. I want to join the tables in the way that there is one column called name which contains the name from A or B if only one is given and from A if both are given. In my opinion an operator doing somethig like: If name in table A is empty then take the name from table B should solve the problem. How can I do this?

And a second question:

How do I perform an operator that adds a prefix to all attribute names of a table? My guess again would be to use "loop attributes"

Thanks in advance,
Markus


Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Markus,
    of course both is possible and I will post a process below, that will show how to handle it:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="134" width="547">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" expanded="true" height="76" name="Rename by Replacing" width="90" x="179" y="30">
            <parameter key="replace_what" value="-"/>
            <parameter key="replace_by" value="_"/>
          </operator>
          <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
            <list key="function_descriptions">
              <parameter key="combination" value="if(col_adj != &quot;?&quot;,col_adj, if(pension != &quot;?&quot;, pension, 0/0))"/>
            </list>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    The Rename by Replacing operator will rename a bunch of attributes, selectable by the attribute filter parameters. Each attribute will then be renamed by replacing all matching regions of its name to the replace by parameter. Please note, that you can put in capturing groups. This is important for your problem. The following process will show how to prepend a common prefix:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="134" width="547">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" expanded="true" height="76" name="Rename by Replacing" width="90" x="179" y="30">
            <parameter key="replace_what" value="(.*)"/>
            <parameter key="replace_by" value="common_prefix_$1"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
      Sebastian
Sign In or Register to comment.