Options

correct use of the date_get in the expression buillder

MRWoodMRWood Member Posts: 11 Contributor I
edited August 2023 in Help
Hi, having looked at previous forum post on the topic i may have missed something.
I am trying to extract from a date-tim4 field(yyyy,MM,DD) , 3 new columns - Year, Month as a number , week number.
Previous posts show for instance the method as -  date_get(Date, DATE_UNIT_YEAR,"Year")  - I assume where Date is the column header.  and "Year is the new Column holding the year data".

If someone could point me at the correct syntax I would appreciate it

Thanks

Best Answer

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Solution Accepted
    Hi,
    its just convention that the year starts at the 0th month in date_get. You can just add 1?
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,

    with one of the latest versions we changed the date time functions. You often need to provide the timezone explicitly. Thats also visible in the documentation of this function:
     
    So you want to add your time zone. This makes a lot of sense, because depending on where you are the day may be different.

    maybe try
    date_get([Date], DATE_UNIT_YEAR, "CET")

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MRWoodMRWood Member Posts: 11 Contributor I
    edited August 2023
    Hi Martin,  Thanks for the reply.  Not sure why you need to add a Time zone when the dates are specific calendar.  Even so using CET as the 3rd attribute the return is still incorrect

    The date format is yyyy-mm-dd   no time.   for instance, date is 2002-04-08, using expression builder adding:
    Year           date_get(Date,DATE_UNIT_Year,"CET")
    Month      date_get(Date,DATE_UNIT_MONTH,"CET")  
    Week        date_get(Date,DATE_UNIT_WEEK,"CET")

    The out put is Year = 2002, Month = 3 Week = 14  

    As we can see April should have returned as 4 and the week number should I believe have returned as 15.

    In addition I have calendar month 0 for January which i dont think i have ever seen before, is this because RapidMiner is using a different base calendar format ?

    Again any suggestions are welcome

    Mark

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,

    dates and date_time are in RapidMiner internally the same thing. Its always a UNIX timestamp or to be more specific I think Instant in the java universe. (Correct me here if I am wrong @Kevin_Majchrzak !). If I remember correctly we are always following the java "convention" when doing time operations.

    "Does the time zone matter"? It actually often does. In previous versions of RapidMiner we took implicitly the time zone of the executing system. This can become very confusing when you work on AIHubs, which are not in your local time zone. So we decided to always ask for the time zone, so we can be 100% sure what the result should be.

    On the 14: This can have two reasons. The first one is the question what really is the 1st week of the year. If the 1st of January is a Wednesday, is this then the first week, or the one which starts on Monday? The other one is if you start to count with 0 or 1. It would not surpirse me if different countries have different conventions here, similar to the week start day being Monday in Europe but Sunday in the US.

    I hope this helps,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MRWoodMRWood Member Posts: 11 Contributor I
    Thank you for the reply Martin, unfortunately the reasons do not really help. 

    Its hard to describe to a user that their financial spreadsheet data could end up being incorrect because the system uses a different time stamp to the CV data.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,
    sorry - I cannot follow here. I do not see where its incorrect? You mean the 14?

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    I went one level deeper,
    what we use interally is this:

     
    And then the respective getDayOfMonth functions and so on. So what we use is the Java-Standard except for one thing: We define January as 0. That has historic RM code reasons.
    The other thing is that the Week of year is defined like this:
    The aligned week within a year.
    This represents concept of the count of weeks within the period of a year where the weeks are aligned to the start of the year. This field is typically used with ALIGNED_DAY_OF_WEEK_IN_YEAR.
    For example, in a calendar systems with a seven day week, the first aligned-week-of-year starts on day-of-year 1, the second aligned-week starts on day-of-year 8, and so on. Thus, day-of-year values 1 to 7 are in aligned-week 1, while day-of-year values 8 to 14 are in aligned-week 2, and so on.
    Calendar systems that do not have a seven day week should typically implement this field in the same way, but using the alternate week length.
    but this is also the java standard.

    I hope this helps!
    Best,
    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MRWoodMRWood Member Posts: 11 Contributor I
    Hi Martin, 
    What I meant is that using the get_date expression as the means to split a Date into its respective elements would leave a Month that is 1 behind ie January = 0.   This has the potential to cause errors for users who are not Unix/Data Science professionals, especially where there are a number of team members supporting one workflow that might have multiple CSV inputs such as Finance/Procurement teams.  Perhaps you can recommend a different workflow node that would negate this?

    Regards
    Mark   
  • Options
    MRWoodMRWood Member Posts: 11 Contributor I
    Hi, I didnt have this in other applications so I would have to put this one down to a local training requirements
Sign In or Register to comment.