Options

Outer join behaves differently if left and right inputs are swapped

awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
edited December 2019 in Help
Hello

I'm trying to join two example sets using the outer join option but I've observed that the operator behaves differently depening on which order the left and right inputs are presented.

I've made an example that shows this; if the inputs are swapped the results change.

Am I right to think the order should not make a difference?

regards

Andrew
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
    <process expanded="true" height="686" width="858">
      <operator activated="true" class="generate_data" compatibility="5.1.001" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="number_examples" value="1"/>
        <parameter key="number_of_attributes" value="2"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="5.1.001" expanded="true" height="76" name="Generate ID" width="90" x="45" y="210"/>
      <operator activated="true" class="generate_attributes" compatibility="5.1.001" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="210">
        <list key="function_descriptions">
          <parameter key="a1" value="10+id"/>
          <parameter key="a2" value="20+id"/>
        </list>
      </operator>
      <operator activated="true" class="declare_missing_value" compatibility="5.1.001" expanded="true" height="76" name="Declare Missing Value" width="90" x="179" y="75">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="a2"/>
        <parameter key="numeric_value" value="21.0"/>
      </operator>
      <operator activated="true" class="generate_data" compatibility="5.1.001" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="525">
        <parameter key="number_examples" value="2"/>
        <parameter key="number_of_attributes" value="2"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes (2)" width="90" x="45" y="435">
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="5.1.001" expanded="true" height="76" name="Generate ID (2)" width="90" x="45" y="345"/>
      <operator activated="true" class="generate_attributes" compatibility="5.1.001" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="179" y="345">
        <list key="function_descriptions">
          <parameter key="a2" value="20+id"/>
          <parameter key="a3" value="30+id"/>
        </list>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="380" y="75"/>
      <operator activated="true" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply (2)" width="90" x="380" y="435"/>
      <operator activated="true" class="join" compatibility="5.1.001" expanded="true" height="76" name="Join" width="90" x="514" y="255">
        <parameter key="join_type" value="outer"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Declare Missing Value" to_port="example set input"/>
      <connect from_op="Declare Missing Value" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
      <connect from_op="Generate ID (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 2"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="Join" to_port="left"/>
      <connect from_op="Multiply (2)" from_port="output 2" to_port="result 3"/>
      <connect from_op="Join" from_port="join" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="198"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    they are equal in the sense that the same number of examples with the same set of attributes (but in different order) is returned. They might differ in the values of the examples if the example sets are contradictionary.

    Greetings,
      Sebastian
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    I get the logic but I was hoping for a loophole. I set one of the values explicitly to missing and I observe this takes precedence over an actual value if missing is encountered first. Logically, it's missing so the second value should take precedence over it.

    regards

    Andrew
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    well I doubt that this behavior is desired in all situations. And if you are in another situation you will quite blame us why the missing value is just overwritten although all other values are kept.

    You will have to deal with this problem explicitly. The only thing one could do is to include a parameter for that. If you want this, please go ahead and make a feature request for that.

    Greetings,
      Sebastian
Sign In or Register to comment.