Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
date_diff in months
christos_karras
Member Posts: 50 Guru
in Help
I need to calculate the exact number of months between two dates. Using date_diff and dividing by the number of milliseconds in a month won't work because the number of days varies each month. Is there any way this can be done without using Python/R scripting?
Tagged:
1
Answers
It seems like you would be much better off using a unit that does have a single consistent length, such as weeks or days, rather than months.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
If it's feasible in one direction (add months to a date), there's no reason for it not to be feasible in the other direction (subtract two dates to get the number of months): something like date_diff([FirstDate], [SecondDate], DATE_UNIT_MONTH)
I need this as part of other date manipulations, and using exact months is the right level for the data I have.
18 Feb 2020 + one month = 18 Mar 2020 ---> EASY
18 Feb 2020 - one month = 18 Jan 2020 ---> EASY
# of months between 18 Feb 2020 and 18 Mar 2020 ---> one month I assume?
# of months between 18 Feb 2020 and 16 Mar 2020 ---> ???
# of months between 18 Feb 2020 and 1 Mar 2020 ---> ???
# of months between 18 Feb 2020 and 9 Aug 2019 ---> ???
Scott
You're right, but after re-reading my question I realized there was some ambiguity because I talked about an "exact number of months". What I meant by this was that I wanted to avoid dividing a number of days by the average number of days in a month (30.416667 if we ignore leap years) because I wanted to avoid incorrect approximations for large enough time scales. But after doing the calculation I found it would not matter for time differences below around 125 years.
For example: 2014-03-01 - 2014-06-30 = 45900 days, and 45900 / 30.416667 = 1509.04 months. But if I use dateutil.relativedelta in Python, the difference is 1508 months and 1 day: relativedelta.relativedelta(pd.to_datetime("2140-03-01"), pd.to_datetime("2014-06-30")) = 125 years, 8 months, 1 day = 1508.03 months.
We can simplify the specifications by instead having a function that returns a whole number of months (rounded)
For your examples:
18 Feb 2020 + one month = 18 Mar 2020 ---> Yes
18 Feb 2020 - one month = 18 Jan 2020 ---> Yes
# of months between 18 Feb 2020 and 18 Mar 2020 ---> 1 month
# of months between 18 Feb 2020 and 16 Mar 2020 ---> 27 days = rounded up to 1 month
# of months between 18 Feb 2020 and 1 Mar 2020 ---> 12 days = rounded down to 0 months
# of months between 18 Feb 2020 and 9 Aug 2019 ---> 5 months and 20 days, rounded up to 6 months
Other examples:
# of months between 26 Feb 2020 and 03 Mar 2020 ---> 6 days = rounded down to 0 months
# of months between 26 Feb 2020 and 03 Aug 2020 ---> 5 months and 6 days = rounded down to 5 months
That being said, I think such a feature would be low priority. I hoped to do as much as possible date manipulations using native RapidMiner features, but I think I will end having to do most date manipulations in Python because I have multiple time zone conversions to do (I need to combine data from different servers that use different time zones, running on RapidMiner Studio which is in another time zone too). And the upcoming new Python Operator Framework makes that option more interesting.
On the other hand, I agree 100% that this is a perfect candidate for the Python Operator Framework. This is exactly why we think it will be popular.
Scott
Scott
I tried your solution but it's not giving me the right results for every cases. Examples:
- Feb 20 2020 to Dec 29 2020 --> 10 months OK
- Feb 18 2020 to Oct 9 2022 --> 207 months, should be 32 months
- Feb 21 2020 to May 25 2021 --> 103 months, should be 15 months
I made an updated process that includes your solution (RapidMiner_DiffInMonths_v1), an alternative solution I tested using RapidMiner expressions (RapidMiner_DiffInMonths_v2) and a solution using relativedelta in Python.RapidMiner_DiffInMonths_v2 (Note: this doesn't handle fractional month, so this is like applying a floor() function to the delta):
Python_DiffInMonths: (this implementation does rounding of months but it is not perfect for rounding differences of 14-15-16 days but should be good enough for my needs)
Full process XML:
As for forum viewing, it should work in all browsers. Can you pls be more specific about what your setup is?
Scott