Relating datasets

artdijk · March 2010

I am new to datamining and RM,
I want to relate two data tables. Each table has a unique key of date and number. Thus the tables should be related via this key. How can this be done in RM ?
Thanks
Arthur

land · March 2010

Hi Arthur,
what exactly do you mean by relate? I guess it's the same you do, when you join some relational data base tables? We have an operator called "Join" which will do this.

Greetings,
Sebastian

artdijk · March 2010

Thanks Sebastian,
The problem is before the join is executed.
Table A: has as field names: Flight Date Delay Pax
Table B: has as field names: Flight Date Report
In the result of the Join I want to have a table C which has as rows the flights that appears in both tables and as fieldnames the fieldnames that appear in both tables. So I can see how many pax had delay.
Table A and Table B have no Id. They are linked by Flight and Date.
How is this done in Rapid Miner,
Thanks
Arthur

land · March 2010

Hi Arthur,
so neither flight and date are unique, but flight + date is unique? Then I would suggest building a new id column from the values of flight and date column. This column can be used for the join. Let make a short sample process:

After we generated some data and transformed it for having an comparable situation, we build a new id like attribute by concatenating the values of this two attribute:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="296" width="1018">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="number_examples" value="150"/>
        <parameter key="use_local_random_seed" value="true"/>
      </operator>
      <operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="179" y="30">
        <parameter key="old_name" value="att1"/>
        <parameter key="new_name" value="Flight"/>
      </operator>
      <operator activated="true" class="rename" expanded="true" height="76" name="Rename (2)" width="90" x="313" y="30">
        <parameter key="old_name" value="att2"/>
        <parameter key="new_name" value="Date"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="447" y="30"/>
      <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
        <list key="function_descriptions">
          <parameter key="FlightDate" value="Flight + &quot; - &quot; + Date"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="715" y="30">
        <parameter key="name" value="FlightDate"/>
        <parameter key="target_role" value="id"/>
      </operator>
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="210">
        <parameter key="use_local_random_seed" value="true"/>
      </operator>
      <operator activated="true" class="rename" expanded="true" height="76" name="Rename (3)" width="90" x="175" y="210">
        <parameter key="old_name" value="att1"/>
        <parameter key="new_name" value="Flight"/>
      </operator>
      <operator activated="true" class="rename" expanded="true" height="76" name="Rename (4)" width="90" x="310" y="210">
        <parameter key="old_name" value="att2"/>
        <parameter key="new_name" value="Date"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" expanded="true" height="76" name="Numerical to Polynominal (2)" width="90" x="445" y="210"/>
      <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="580" y="210">
        <list key="function_descriptions">
          <parameter key="FlightDate" value="Flight + &quot; - &quot; + Date"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="715" y="210">
        <parameter key="name" value="FlightDate"/>
        <parameter key="target_role" value="id"/>
      </operator>
      <operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="916" y="75"/>
      <connect from_op="Generate Data" from_port="output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
      <connect from_op="Rename (2)" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Generate Data (2)" from_port="output" to_op="Rename (3)" to_port="example set input"/>
      <connect from_op="Rename (3)" from_port="example set output" to_op="Rename (4)" to_port="example set input"/>
      <connect from_op="Rename (4)" from_port="example set output" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Join" from_port="join" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

I hope, it will help you.

Greetings,
Sebastian

artdijk · April 2010

Thanks for your process. It is not yet the solution but we are getting closer.
I can never get the same Id in both file for the same FlightNr and Data combination because the number of records is different in both files.
I try to explain the problem in different words:

File A has fields: FlightNr, Date, Pax, Fuel
File B has fields: FlightNr, Date, Event
File B has less records. Only the flight that have an event are in file B.
I want to get
File C with fields: FlightNr, Date, Pax, Fuel, Event
On most flights no events are recorded so many event fields will be empty in file C.
Resulting file example:
Flt112,1-1-2010,34,2300,?
Flt116,1-1,2010,45,2900,turbulence
Flt112,2-1-2010,23,2100,?
Flt116,2-1-2010,41,2800,?

Thanks
Arthur

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Relating datasets

Answers