🥳 RAPIDMINER 9.9 IS OUT!!! 🥳

The updates in 9.9 power advanced use cases and offer productivity enhancements for users who prefer to code.

CLICK HERE TO DOWNLOAD

Create a dataset using some loops and three macros

Himanshu_PantHimanshu_Pant Member Posts: 30 Contributor I
I have a dataset which contain a column dates. I have three variables a= 2, b= 3, c=1
for (columnName, columnData) in df['dates'].iteritems():
    for k in range(0,b-a+1,c):
        date1 = datetime.strptime(columnData,"%Y-%m-%dT%H:%M:%S.000Z") - timedelta(days=k)
        date2 = datetime.strptime(columnData,"%Y-%m-%dT%H:%M:%S.000Z") - timedelta(days=k+a)

So if first date is 2019-04-10T18:45:00.000Z, output should be
2019-04-10T18:45:00.000Z
2019-04-10T18:30:00.000Z....................
.
.
2019-04-09T12:30:00.000Z
..............
2019-04-08T18:45:00.000Z-> till first iteration
2019-04-09T18:45:00.000Z
2019-04-09T18:30:00.000Z...........
.
2019-04-08T03:30:00.000Z
.
2019-04-07T18:45:00.000Z-> till second iteration and till final result (there will be only iteration for this case).

Alternate explanation: if date is 2019-04-10T18:45:00.000Z, if b=3 that means start date will be 10 and end date will be 7, for 4 it would be from 10 to 6. But based on the value of 'a' it will get split. So if a=2 then split will (for b=3)
First: 10 to 8
Second: 9 to 7
if a=3 (for b=3), then
First: 10 to 7 (Thats it). 

One more part is there (can be ignored for now can see later) mentioned below:
one more factor that is c, which decide the jump of date, so if b=3, a=2, c= 1  
First: 10 to 8
Second: 9 to 7. As you can see jump from 10 to 9.

However if b=3, a=2, c= 2
First: 10 to 8. Thats it as based on b=3 it can go from 10 to 7. Based on c=2 ,next jump would be from 8 to 6. However it can't cross 7 so it gave one output.

Answers

  • mharmsmharms Administrator, Employee, Member Posts: 5   RM Team Member
    Hi,

    You should be able to to use Generate Attributes to create these intervals from the macros/variables and then use Generate Attributes again with date_add() to apply them. Let me know if that helps.

    Mike
  • Himanshu_PantHimanshu_Pant Member Posts: 30 Contributor I
    Ya I already use these operators. Still not able to come to the solution.
  • mharmsmharms Administrator, Employee, Member Posts: 5   RM Team Member
    Are you able to share a dummy process so that we can take a look?
  • Himanshu_PantHimanshu_Pant Member Posts: 30 Contributor I
    Please have a look.
  • mharmsmharms Administrator, Employee, Member Posts: 5   RM Team Member
    Can you also share the input dataset in CSV and the desired output in CSV? I am struggling to understand the explanation at present. Thanks
  • Himanshu_PantHimanshu_Pant Member Posts: 30 Contributor I
    Yellow highlighted one is for first date in output sheet.
    Green highlighted one is for Second  date in output sheet.
    In this case I have used a=2, b=3, c=1
  • mharmsmharms Administrator, Employee, Member Posts: 5   RM Team Member
    edited June 9
    I think this does what you want and without a loop. 

    Logic:
    - create 192 example with a step size of 15
    - cartesian product with the 2 input dates
    - adjust the original date in minutes based on the interval 

    For the variables
    - create an example size with a lot of examples based on the correct step size ( i couldn't use a macro in the parameter, i will raise this)
    - use the generate macro calculate the max value of the while/loop
    - filter example to get the number of iterations

    Altogether
    - cartesian product will create the correct number of rows
    - use generate attributes to apply the date minus logic

    I hope that helps. There were quite a few disparities between the original process and the data sent over but I hope this gives you a good idea. I'm not a fan of loops although quite often that seems the most logical way to think of the solution.

    Mike
    mschmitz
Sign In or Register to comment.