Options

identify consecutive dates/values

lghansselghansse Member Posts: 18 Contributor II
edited December 2018 in Help

Hi everyone, 

 

I'm trying to use rapidminer to filter examples in an exampleset with at least 3 consecutive dates (see below). So more specifically, I want to filter those ID's that have dates where there is no more than -+ 1 month between two dates. 

 

id date
1 20/01/2018
1 21/02/2018
2 19/01/2018
2 19/02/2018
2 21/04/2018
2 21/05/2018
3 22/03/2018
3 22/04/2018
3 23/05/2018
3 22/07/2018
3 23/08/2018
3 22/09/2018
4 20/01/2018
4 21/02/2018
4 21/03/2018
5 19/01/2018
5 20/02/2018
5 19/05/2018
5 19/06/2018
5 20/08/2018
5 20/09/2018
6 22/01/2018
6 21/02/2018
6 20/04/2018
6 20/05/2018
6 20/07/2018
6 21/08/2018
6 20/09/2018

 

In order to do that I created a script in where I generated a date_id that was made up of the year + month, and lagged on date_id in order to calculate the difference between two dates (which could not be more than 1). Working with days was not really an option since the transaction was/is not always exactly one month apart. So my data looked more or less like this: 

 

id date date_id date_id-1 difference
1 20/01/2018 20180 ? ?
1 21/02/2018 20181 20180 1
2 19/01/2018 20180 ? ?
2 19/02/2018 20181 20180 1
2 21/04/2018 20183 20181 2
2 21/05/2018 20184 20183 1
3 22/03/2018 20182 ? ?
3 22/04/2018 20183 20182 1
3 23/05/2018 20184 20183 1
3 22/07/2018 20186 20184 2
3 23/08/2018 20187 20186 1
3 22/09/2018 20188 20187 1
4 20/01/2018 20180 ? ?
4 21/02/2018 20181 20180 1
4 21/03/2018 20182 20181 1
5 19/01/2018 20180 ? ?
5 20/02/2018 20181 20180 1
5 19/05/2018 20184 20181 3
5 19/06/2018 20185 20184 1
5 20/08/2018 20187 20185 2
5 20/09/2018 20188 20187 1
6 22/01/2018 20180 ? ?
6 21/02/2018 20181 20180 1
6 20/04/2018 20183 20181 2
6 20/05/2018 20184 20183 1
6 20/07/2018 20186 20184 2
6 21/08/2018 20187 20186 1
6 20/09/2018 20188 20187 1

 

So in my example ID's 4 en 6 should be included. My plan was to use an aggregate function on 'difference' and compare them to the number of examples in each loop on ID, to idenify consecutive dates (because the aggregate would be equal to the number of examples -1, since every first row is empty). However, this method is only good for identifying cases that have only exactly three dates that are consecutive dates, or cases where every date is consecutive. So, I'm a bit at loss of what I could do to make this work (maybe I'm making things needlessly complicated, but I have no knowledge of a function in RM that allows to identify consecutive values or dates). 

 

Lise

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I think you have overcomplicated this.  There is a datediff function in RapidMiner that you can use to do something very similar.  Just lag the original date (not your polynominal id version) and then use the datediff function, and you can then convert it to whatever unit you want  (it's calculated in milliseconds) and filter for whatever period gap you want (e.g., 1 month).

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.