Union vs. Join

ashxzashxz Member Posts: 1 Contributor I
edited July 2019 in Help
Hello all.
I have some problems with Union and Join blocks.

RapidMiner 5.3.008.
For example: I have two samples — dataset1 and dataset2.
dataset1.aml:
<attribute name="attribute1" sourcecol="1" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>
<attribute name="attribute2" sourcecol="2" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>
<attribute name=" attribute3" sourcecol="3" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>

dataset1.dat:
false false false

dataset2.aml:
<attribute name="attribute3" sourcecol="1" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>
<attribute name="attribute4" sourcecol="2" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>

dataset2.dat:
true true

I do Union between two datasets and I see the next .aml file:

<attribute name="attribute1" sourcecol="1" valuetype="binominal">
   <value>false</value>
</attribute>
<attribute name="attribute2" sourcecol="2" valuetype="binominal">
   <value> false </value>
</attribute>
<attribute name=" attribute3" sourcecol="3" valuetype="binominal">
   <value>true</value>
   <value>false</value>
</attribute>
<attribute name=" attribute4" sourcecol="4" valuetype="binominal">
   <value>true</value>
</attribute>

but if I do Join(outer) between two datasets and I see the next .aml file:

<attribute name="attribute1" sourcecol="1" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>
<attribute name="attribute2" sourcecol="2" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>
<attribute name=" attribute3" sourcecol="3" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>
<attribute name=" attribute4" sourcecol="4" valuetype="binominal">
   <value>false</value>
   <value>true</value>
</attribute>

And I have some questions.
1. Why do I have the different format of .aml files after Union while if I use Join I have a correct file format (output file format is like input files format)? Also after Union I had an error: "The number of nominal values is not the same for training and application" at the same dataset if I use it like training (DT) and testing (apply model) set.
2. I need to join two datasets like it does Join (outer) block, but Join removes the values of the 'label' attribute in the second dataset even if the attribute's values are different. What should I do to keep the value of the label attribute in the second dataset?

Thanks!
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    Union is the same as Append, with the exception that it can append datasets with different attributes. If att4 is missing in the first dataset, then it is filled with missings.

    Join performs a join as is known from e.g. SQL databases.

    If you have further questions and want to supply more examples, please do so as a RapidMiner process, such that we can easily reproduce them. To generate one-line data sets, Generate Data by User Specification comes in handy.

    Best regards,
    Marius
Sign In or Register to comment.