Options

Relating datasets

artdijkartdijk Member Posts: 5 Contributor II
edited November 2018 in Help
I am new to datamining and RM,
I want to relate two data tables. Each table has a unique key of date and number. Thus the tables should be related via this key. How can this be done in RM ?
Thanks
Arthur

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Arthur,
    what exactly do you mean by relate? I guess it's the same you do, when you join some relational data base tables? We have an operator called "Join" which will do this.

    Greetings,
      Sebastian
  • Options
    artdijkartdijk Member Posts: 5 Contributor II
    Thanks Sebastian,
    The problem is before the join is executed.
    Table A: has as field names: Flight Date Delay Pax
    Table B:  has as field names: Flight Date Report
    In the result of the Join I want to have a table C which has as rows the flights that appears in both tables and as fieldnames the fieldnames that appear in both tables. So I can see how many pax had delay.
    Table A and Table B have no Id. They are linked by Flight and Date.
    How is this done in Rapid Miner,
    Thanks
    Arthur
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Arthur,
    so neither flight and date are unique, but flight + date is unique? Then I would suggest building a new id column from the values of flight and date column. This column can be used for the join. Let make a short sample process:

    After we generated some data and transformed it for having an comparable situation, we build a new id like attribute by concatenating the values of this two attribute:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="296" width="1018">
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="150"/>
            <parameter key="use_local_random_seed" value="true"/>
          </operator>
          <operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="179" y="30">
            <parameter key="old_name" value="att1"/>
            <parameter key="new_name" value="Flight"/>
          </operator>
          <operator activated="true" class="rename" expanded="true" height="76" name="Rename (2)" width="90" x="313" y="30">
            <parameter key="old_name" value="att2"/>
            <parameter key="new_name" value="Date"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="447" y="30"/>
          <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
            <list key="function_descriptions">
              <parameter key="FlightDate" value="Flight + &quot; - &quot; + Date"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="715" y="30">
            <parameter key="name" value="FlightDate"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="210">
            <parameter key="use_local_random_seed" value="true"/>
          </operator>
          <operator activated="true" class="rename" expanded="true" height="76" name="Rename (3)" width="90" x="175" y="210">
            <parameter key="old_name" value="att1"/>
            <parameter key="new_name" value="Flight"/>
          </operator>
          <operator activated="true" class="rename" expanded="true" height="76" name="Rename (4)" width="90" x="310" y="210">
            <parameter key="old_name" value="att2"/>
            <parameter key="new_name" value="Date"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" expanded="true" height="76" name="Numerical to Polynominal (2)" width="90" x="445" y="210"/>
          <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="580" y="210">
            <list key="function_descriptions">
              <parameter key="FlightDate" value="Flight + &quot; - &quot; + Date"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="715" y="210">
            <parameter key="name" value="FlightDate"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="916" y="75"/>
          <connect from_op="Generate Data" from_port="output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Rename (3)" to_port="example set input"/>
          <connect from_op="Rename (3)" from_port="example set output" to_op="Rename (4)" to_port="example set input"/>
          <connect from_op="Rename (4)" from_port="example set output" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    I hope, it will help you.

    Greetings,
      Sebastian
  • Options
    artdijkartdijk Member Posts: 5 Contributor II
    Thanks for your process. It is not yet the solution but we are getting closer.
    I can never get the same Id in both file for the same FlightNr and Data combination because the number of records is different in both files.
    I try to explain the problem in different words:

    File A has fields: FlightNr, Date, Pax, Fuel
    File B has fields: FlightNr, Date, Event
    File B has less records. Only the flight that have an event are in file B.
    I want to get
    File C with fields: FlightNr, Date, Pax, Fuel, Event
    On most flights no events are recorded so many event fields will be empty in file C.
    Resulting file example:
    Flt112,1-1-2010,34,2300,?
    Flt116,1-1,2010,45,2900,turbulence
    Flt112,2-1-2010,23,2100,?
    Flt116,2-1-2010,41,2800,?

    Thanks
    Arthur
Sign In or Register to comment.