[Solved] Range Join: Joining two tables --

uenge-sanuenge-san Member Posts: 12 Contributor II
edited April 2020 in Help

Hello Rapidminer Community,

 

I'm trying to join two tables: one contains a start- and an end-date_time, the other only a timestamp.

The goal is, to classify, between which timestamp the event in table 2 has been

 

Example:

Table 1

ID # shift #  start  # end

1 ; "A" ; 2017-01-01 00:00:00 ; 2017-01-03 23:59:59

2 ; "B" ; 2017-01-04 00:00:00 ; 2017-01-05 23:59:59

3 ; "C" ; 2017-01-06 00:00:00 ; 2017-02-01 23:59:59

 

Table 2

ID # attr1 # timestamp

1; "error 1" ; 2017-01-01 07:00

2; "error 2" ; 2017-01-02 09:00

3; "error 4" ; 2017-01-06 07:00

 

Expected joined table

ID # attr1 # timestamp # shift (Table1)

1; "error 1" ; 2017-01-01 07:00 ; "A"

2; "error 2" ; 2017-01-02 09:00 ; "A"

3; "error 4" ; 2017-01-06 07:00 ; "C"

 

Any hint, how to get to the expected joined table??

 

Thanks in advance

 

 

 

Tagged:

Best Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted

    Hi,

     

    you could try using my generic join solution in a Groovy script:

    https://datascientist.at/2016/06/generic-joins-in-rapidminer/#english

     

    There's also a template building block: 

    http://community.rapidminer.com/t5/RapidMiner-Building-Blocks/Generic-join-script/m-p/33908#U33908

     

    You would extend the script to compare for the timestamp between the start and the end times.

     

    Regards,

     

    Balázs

  • uenge-sanuenge-san Member Posts: 12 Contributor II
    Solution Accepted

    Amazing,

     

    never heard about Groovy Script, but got it working in less than 30 minutes (and most of the time was consumed regarding a bad typo during copying your script into notepad++ ...)

     

    Many, many thanks for your advice! 

     

     

     

    I copied the adjusted script below and highlighted the changes with bold fonts

     

     

     

     

    /**
    * Configuration
    * es1AttName: join attribute name in ExampleSet 1
    * es2AttName: join attribute name in ExampleSet 2
    * joinFunc: Closure (function) to call on both arguments
    */

    es1AttName1 = "start";
    es1AttName2 = "end";

    es2AttName = "TIMESTAMP";

    def joinFunc = { e11, e12, e2 ->
    (e11 <= e2 && e12 >= e2)
    || e2 == null
    }

    /**
    * End of configuration
    */

    //RapidMiner data structures
    import com.rapidminer.tools.Ontology;

    //Determines if an attribute is nominal
    def isNominal = { att ->
    type = att.getValueType();
    type == Ontology.NOMINAL || type == Ontology.BINOMINAL || type == Ontology.POLYNOMINAL;
    }

    //Fetches the correct value type from the example, depending on the attribute type
    def getExampleValue = { ex, att ->
    if (isNominal(att)) {
    ret = ex.getNominalValue(att);
    } else {
    ret = ex.getValue(att);
    }
    ret;
    }

    // First input example set
    ExampleSet es1 = input[0];
    Attributes es1Attributes = es1.getAttributes();
    es1att1 = es1Attributes.get(es1AttName1);
    es1att2 = es1Attributes.get(es1AttName2);

    // Second input example set
    ExampleSet es2 = input[1];
    Attributes es2Attributes = es2.getAttributes();
    es2att = es2Attributes.get(es2AttName);

    int fields = es1Attributes.size() + es2Attributes.size();

    //Arrays for attributes and field metadata
    Attribute[] attributes= new Attribute[fields];

    //Field counter
    int fld = 0;

    //Copy each attribute from ExampleSets 1 and 2 to the result attribute list
    es1Attributes.each{f ->
    attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
    fld++;
    }
    es2Attributes.each{f ->
    attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
    fld++;
    }

    MemoryExampleTable table = new MemoryExampleTable(attributes);
    DataRowFactory rowFactory = new DataRowFactory(0);

    // Loop ExampleSet 1
    for ( Example e1 : es1 ) {

    //Attribute value from ExampleSet 1, current example
    e1val1 = getExampleValue(e1, es1att1);
    e1val2 = getExampleValue(e1, es1att2);


    for (Example e2: es2) {

    e2val = getExampleValue(e2, es2att);

    //Join condition - function configured at the top of the script
    if (joinFunc(e1val1, e1val2, e2val)) {

    data = new Object[fields];
    fld = 0;

    // Copy values from both example sets into the new record
    es1Attributes.each{f ->
    if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
    data[fld] = e1.getNominalValue(f);
    } else {
    data[fld] = e1.getValue(f);
    }
    fld++;
    }
    es2Attributes.each{f ->
    if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
    data[fld] = e2.getNominalValue(f);
    } else {
    data[fld] = e2.getValue(f);
    }
    fld++;
    }

    DataRow row = rowFactory.create(data, attributes);
    table.addDataRow(row);
    }
    }
    }

    ExampleSet exampleSet = table.createExampleSet();

    return(exampleSet);

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Make sure that the date-times are recongized as a date-time in RapidMiner. Then you can select Start Time in Table 1 to join with Timestamp in Table 2 in the Join operator. Just toggle of "Use attribute ID as key" option. 

  • uenge-sanuenge-san Member Posts: 12 Contributor II

    Hello T-Bone,

     

    thanks for your reply.

     

    Trying to solve the problem with your suggested solution I found out, that there was an import problem with the date_time in table 1 --> solved

     

    so trying to use the Join Operator after this I only get one result out of the joining, which is the event, where the timestamp and the start date matches exactly. 

    But unfortunately didn't come to my expected table...

     

    Any other hints??

     

    BR

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Are you using the right join type? Right, Left, Inner, or Outer?

  • uenge-sanuenge-san Member Posts: 12 Contributor II

    Hi T-bone

     

    I think so, the join type INNER should be the correct one, isn't it?

     

    Thank you very much

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi uenge-san,

     

    glad it's working for you!

     

    This will be a good reference for everyone trying to do range joins in RapidMiner.

     

    Regards,

     

    Balázs

Sign In or Register to comment.