Options

Date time US to UK format conversion problems

jasonjason Member Posts: 5 Contributor I
edited December 2018 in Help

Hi all

 

I have an Excel file I'm importing which has a date/time in US format (month/day/year hour:min:sec AM/PM) which I need to convert to UK format (day/month/year hour:min:sec in 24 hour format).

 

I've done some reading and read that a 'Date to Nominal' then 'Nominal to Date' conversion back was the way forward.

I've managed to get the Date to Nominal conversion working, so my US date is output as a nominal in the correct UK format date (ie. my month and day are switched correctly, so Jan 8 becomes 1st August - which is what I need to happen) with the time in 24 hour format.

 

However, on the conversion from Nominal back to date RapidMiner is doing some odd things that stuff my results completely.

 

1. Regardless of the date format I specify in the conversion parameter, it shows my dates as AM/PM in 12 hour format not the 24 hour format in the Nominal date

2. It subtracts 1 hour from the time, so a date that is 11:35:12 AM is converted back as 10:35:12 AM

 

This is maddeningly frustrating!  Please can someone have a look at the attached data file and conversion flow and give me a hint as to where this is going wrong?  It should be so simple and I've read the help file for Nominal to Date over and over now and can't see the problem.

 

Many thanks

 

 

Best Answer

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted

    OK I love stuff like this.  Could not resist tackling it.  This was NOT the most elegant way to do this; I mainly did it this way so you can see how all the date_ functions work, etc...

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="/Users/genzerconsulting/Desktop/Rapidminer date conversion issue 2.xlsx"/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="date.true.polynominal.regular"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="136">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="date.contains./"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">contains &amp;quot;/&amp;quot;</description>
    </operator>
    <operator activated="true" class="nominal_to_date" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Date" width="90" x="447" y="34">
    <parameter key="attribute_name" value="date"/>
    <parameter key="date_type" value="date_time"/>
    <parameter key="date_format" value="MM/dd/yyyy hh:mm:ss a"/>
    <parameter key="keep_old_attribute" value="true"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="581" y="34">
    <parameter key="attribute_name" value="date_old"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="rename" compatibility="7.6.001" expanded="true" height="82" name="Rename" width="90" x="715" y="34">
    <parameter key="old_name" value="date"/>
    <parameter key="new_name" value="dateCorrected"/>
    <list key="rename_additional_attributes">
    <parameter key="date_old" value="date"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="447" y="289">
    <list key="function_descriptions">
    <parameter key="dateExcelNumerical" value="parse(date)"/>
    <parameter key="dayExcelNumerical" value="floor(dateExcelNumerical)"/>
    <parameter key="timeExcelNumerical" value="dateExcelNumerical-dayExcelNumerical"/>
    <parameter key="dateExcelBeginning" value="date_parse(&quot;1/1/1900&quot;)"/>
    <parameter key="dayExcel" value="date_add(dateExcelBeginning,dayExcelNumerical-2,DATE_UNIT_DAY)"/>
    <parameter key="dayExcelNominal" value="date_str(dayExcel,DATE_SHORT,DATE_SHOW_DATE_ONLY)"/>
    <parameter key="dayExcelCorrected" value="date_parse_custom(dayExcelNominal,&quot;dd/MM/yy&quot;,&quot;gb&quot;)"/>
    <parameter key="hour" value="floor(24*timeExcelNumerical)"/>
    <parameter key="minute" value="floor(60*((24*timeExcelNumerical)-hour))"/>
    <parameter key="second" value="floor(60*((60*((24*timeExcelNumerical)-hour))-floor(60*((24*timeExcelNumerical)-hour))))"/>
    <parameter key="hourNominal" value="if(hour&lt;10,concat(&quot;0&quot;,str(hour)),str(hour))"/>
    <parameter key="minuteNominal" value="if(minute&lt;10,concat(&quot;0&quot;,str(minute)),str(minute))"/>
    <parameter key="secondNominal" value="if(second&lt;10,concat(&quot;0&quot;,str(second)),str(second))"/>
    <parameter key="timeConcat" value="concat(hourNominal,&quot;:&quot;,minuteNominal,&quot;:&quot;,secondNominal)"/>
    <parameter key="dateCorrectedWithHour" value="date_add(dayExcelCorrected,hour,DATE_UNIT_HOUR)"/>
    <parameter key="dateCorrectedWithHourAndMinute" value="date_add(dateCorrectedWithHour,minute,DATE_UNIT_MINUTE)"/>
    <parameter key="dateCorrected" value="date_add(dateCorrectedWithHourAndMinute,second,DATE_UNIT_SECOND)"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (2)" width="90" x="581" y="289">
    <parameter key="attribute_name" value="date"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="289">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="date|dateCorrected"/>
    </operator>
    <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="103" name="Append" width="90" x="849" y="136"/>
    <operator activated="true" class="sort" compatibility="7.6.001" expanded="true" height="82" name="Sort" width="90" x="983" y="136">
    <parameter key="attribute_name" value="dateCorrected"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Nominal to Date" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="unmatched example set" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Rename" to_port="example set input"/>
    <connect from_op="Rename" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Append" from_port="merged set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="105"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="50" resized="true" width="180" x="224" y="43">dates in the form MM/dd/yyyy</description>
    <description align="center" color="yellow" colored="false" height="53" resized="true" width="177" x="215" y="296">dates in the form 43xxx.xxxxxx</description>
    </process>
    </operator>
    </process>

    Have a nice weekend as well.


    Scott

Answers

  • Options
    jasonjason Member Posts: 5 Contributor I

    Just to add a further question - the attached process also breaks on import (I've now found) for date entries that are past 13 (because there's only 12 months in the year, so the date format is not recognised on import by RapidMiner). All date entries (e.g. 8/15/2017) are being rejected with "Unparseable date" as of course there aren't 15 months in the year, so the idea of just importing in US format doesn't seem to be working either.

     

    So a wider question - how can I take an input file of dates that appear in US format (MM/dd/yyyy) when they are actually UK format and use RapidMiner to properly read them, then convert them to the correct UK format (dd/MM/yyyy).

     

    Thanks!

  • Options
    jasonjason Member Posts: 5 Contributor I

    And here's the sample data file. I'm just trying to get this data imported (it's in US format) and converted to UK format with a 24 hour date/time.  Any help welcomed!

     

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Good morning @jason - I feel your pain.  It is often tedious to get those date/time formats the way you want.  And as a "Yank" I am of course happy with the U.S. formatting.  :)

     

    So just to be clear, you want the DatePosted attribute to be displayed in U.K. format as a "Date time" attribute, not as a Nominal attribute.  Is that correct?  Because it makes a difference.  As you probably saw 100 times, you can convert the import to a Nominal attribute and convert it any way you like.

     

    To the best of my understanding, the way RM Studio displays the "Date time" attribute is governed by system settings.  For example, you can go to Preferences and change the timezone to GMT, and sure enough your data will show GMT instead of my default EST:

     

    Screen Shot 2017-09-01 at 8.25.36 AM.pngScreen Shot 2017-09-01 at 8.25.47 AM.png

    I have done NO processing of the data here - just one Read Excel operator connected to results with all default parameters.  Note that it never makes any difference how the Date time is displayed in this pane for data analysis; this is just for your viewing pleasure. 

     

    I will check more to see how to modify this display of the system prefs so that it can show other formats but of course an easy workaround is to just keep this attribute in its current form, and add a copy of the attribute that is Nominal and in the form you like:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="/Users/genzerconsulting/Desktop/Rapidminer date conversion issue 2.xlsx"/>
    <list key="annotations"/>
    <parameter key="date_format" value="MM/dd/yyyy"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="true" class="generate_copy" compatibility="7.6.001" expanded="true" height="82" name="Generate Copy" width="90" x="179" y="34">
    <parameter key="attribute_name" value="DatePosted"/>
    <parameter key="new_name" value="DatePostedUK"/>
    </operator>
    <operator activated="true" class="date_to_nominal" compatibility="7.6.001" expanded="true" height="82" name="Date to Nominal" width="90" x="313" y="34">
    <parameter key="attribute_name" value="DatePostedUK"/>
    <parameter key="date_format" value="dd/MM/yyyy HH:mm:ss"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Generate Copy" to_port="example set input"/>
    <connect from_op="Generate Copy" from_port="example set output" to_op="Date to Nominal" to_port="example set input"/>
    <connect from_op="Date to Nominal" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

     

     

     

     

     

     

     

     

  • Options
    jasonjason Member Posts: 5 Contributor I

    Hi Scott,

     

    Thanks for the reply (and the support group sympathy ;) ).  Yes, you're right, I need to reformat my data for UK format. Probably helpful to give an update and explain the issue behind it.  I'm using Tableau to analyse some client data which comes from a data source they control.  Their web form is writing an excel format file in US format dates. This disregards the local input settings for some reason and changed last month from UK format dates with a 24h time zone to be displaying data in US format with a 12 hour time zone.

     

    I went through the whole locale settings thing with Tableau, as I was surpised Tableau wasn't able to convert the date format internally. My locale settings and preferences are set to UK format and English (New Zealand) [which uses the UK format].

     

    It emerges with the data set I attached that what is happening is that the Excel output file is writing dates (1st August, 2nd August) as US format dates, right up until it hits the 13th August, when it runs out of US calendar months, so it then dumps the date out as text, switching it to a UK format string (how helpful).

     

    This means I have a data set with all the row entries as an Excel encoded date, which are displaying as US format date/time, followed by a load of string text that is the date in UK format.  No wonder Tableau is getting confused by this, as my dashboard now contains some dates in the correct display (for august), but of course interprets all the US dates as 8th Jan, 8th Feb, 8th March until I hit 8/13 when it starts throwing up errors because it's gone past December in the US.

     

    RapidMiner on import is having the same problem, as it tries to do a date convert on the field, recognises the date imports and then as soon as it hit the string fields, it gives me an unparseable date error and rejects the rest of the data set rows.

     

    After much colourful language ;) in this case, fortunately my data set is relatively small so I (in a crime against RapidMiner and the community) put together an Excel formula to convert the erroneous dates into a string in Excel.  I'm afraid as a newbie I can't figure out how to get RapidMiner to do this for some rows but not others, as I can't get the data set imported properly. If I import it as a string, all the Excel dates up to 8/13 come in as an excel encoded date number so I lose the date.

     

    What I've now managed to achieve is using Excel to convert the dates into a string and I can import a single column of text dates into RapidMiner, so now it can work with it.

     

    However, my RapidMiner problems still remain.  Using 'Nominal to Date' conversion, I can successfully convert from a date string back to a date, but ONLY if I split this out into a Date only field (not DateTime).  I can successfully convert from the time part of the string back to a Time but ONLY if I split this out into a Time field (not DateTime) - i.e. I have two new attributes, one for date, one for time and it works fine.

     

    When it try to assign a string to a DateTime field, it takes an hour off the time. WTF?

     

    Whilst I can hack around with Excel in this particular data set, I could do with mastering this kind of date conversion and issue in RapidMiner as I'm only lucky that the data set is small enough to be opened in Excel.  If I come across similar date conversion needs for bigger contact centre data sets, I'm a bit stuffed.

     

    All my international and locale settings are set up to UK format - the issue here is a client data set in an Excel extract (over which I have no control) provided to me which seems to have hard coded its dates into US format.  Some of the time. ;)

     

    Maybe I need some sort of easy to use data manipulation and conversion tool....

     

    :)

     

    JP

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Wow.  OK I think (?) I get what is going on here.  Remember that Excel is really wonky with dates - internally it computes all dates as a number which is the number of days since midnight Jan 1, 1900 if I remember correctly.  Crazy but true.

     

    So the first "date" in your file, if you force the import to bring it in as a number (use Read Excel operator and set "data set meta data information" parameters as image below) is 42743.334872685184.  And if my math is working today, Jan 8, 2017 is exactly 42,743 days after Jan 1, 1900.  And 0.334872685184 * 24 = 8.036944444416 -> 8am; 036944444416 * 60 = 2.21666666496 -> 2 minutes; 0.21666666496 = 12.9999998976 -> 13 seconds.  So the date/time is Jan 8, 2017 08:02:13.

     

    Screen Shot 2017-09-01 at 2.54.41 PM.png

    To make matters more fun, there is a built-in function that should take care of this nicely for you: it's the "date_millis()" function in Generate Attributes.  It converts any date attribute to the number of milliseconds since an index date.  But of course it's not Excel's 1/1/1900 00:00:00, but 1/1/1970 00:00:00 GMT.

     

    So, maybe it's me being an old-school math teacher but I prefer to go back to the math and get it right.  Bring your dates into Excel AS NUMBERS and go from there.  There should be no need to run Excel macros, etc... 

     

    I'm running out of time today but if you still need help, let me know and I can build a process for you.


    Scott

  • Options
    jasonjason Member Posts: 5 Contributor I

    Hi Scott

     

    Thanks so much for the help. Yes, your analysis is correct and I'm in awe of your ability to work out how Excel date encoding works! Your maths is right and the first date is encoded as Jan 8, 2017 8:02:13.  Of course that's US format, as the *actual* date was 1st August 2017, 8:02:13 in UK or NZ format.  If you go down to about row 1495 in the excel data set you'll see that's where it gets changed to text in a UK format date in the same column.

     

    Thanks very much for the pointer to date_millis(), I'll take a look at that in the coming days and see if I can get the import working with a formula around that.  I have a month to fix it to the next client data file, but this'll happen every month now with the date format changing halfway down the row so that sounds like it may be an option for me.

     

    For now, it's also broken my Tableau dashboard (for some reason importing the corrected data fields which I fixed has re-set all my month filters across dashboards!), so I'm having a chat with the Tableau support teams about it.

     

    Really appreciate your swift response to this.  I'll keep you posted with how I get on with your suggestion over the coming month.

    I agree with you about working out the maths and trusting it rather than how Excel does things ;)

     

    Have a great weekend

     

    JP

Sign In or Register to comment.