Nominal To Date operator To produce date_time with milliseconds

marcomarco Member Posts: 2 Contributor I
edited November 2018 in Help
I'm reading in a CSV with a timestamp which has the format Year, Month,Day, hour:mins:seconds milliseconds

e.g.  '2015-12-31 01:01:01 111'

This gets converted to a nominal data type and I try to convert this later to a 'date_time' data type using the Nominal to Date operator with the following set:

Date Type: 'date_time'
Date Format: yyyy-MM-dd HH:mm:ss SSS
Time Zone: System
Locale: English United States

This gets converted to a date_time type however milliseconds is now missing.

My question is there a way to maintain the milliseconds or another way to do this?

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi,

    are you sure that they are mising? I guess they are just not shown with the standard date stuff.

    Try to calculate date_diff with this data - is all of them ending with 000?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • marcomarco Member Posts: 2 Contributor I
    Hi Martin,

    Yes you're right it is still there, I used the date_get(datetime,DATE_UNIT_MILLISECOND) with generate attributes and was able to retrieve it.

    I guess my question now is there a way to display the date_time with milliseconds otherwise it looks like I've the same date repeated?


    Here's the example test.csv and rapidminer xml below

    seconds,DateTime,att1,Longitude,Latitude
    33.979,2015-05-07 19:04:56 252,1.296,7.401099,51.482471

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="6.4.000" expanded="true" height="60" name="Read CSV" width="90" x="313" y="30">
            <parameter key="csv_file" value="/home/test/test.csv"/>
            <parameter key="column_separators" value=","/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="UTF-8"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="seconds.true.real.attribute"/>
              <parameter key="1" value="DateTime.true.polynominal.attribute"/>
              <parameter key="2" value="att1.true.real.attribute"/>
              <parameter key="3" value="Longitude.true.real.attribute"/>
              <parameter key="4" value="Latitude.true.real.attribute"/>
            </list>
          </operator>
          <operator activated="true" breakpoints="before,after" class="nominal_to_date" compatibility="6.4.000" expanded="true" height="76" name="Nominal to Date" width="90" x="581" y="30">
            <parameter key="attribute_name" value="DateTime"/>
            <parameter key="date_type" value="date_time"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss SSS"/>
            <parameter key="time_zone" value="Europe/Berlin"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes" width="90" x="715" y="30">
            <list key="function_descriptions">
              <parameter key="datetimemilliseconds" value="date_get(DateTime,DATE_UNIT_MILLISECOND)"/>
            </list>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
          <connect from_op="Nominal to Date" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • renewrenew Member Posts: 3 Contributor I

    Hello, I have the same "problem". Did anyone find a solution?

    I'm working with higher frequent sensor data (milliseceonds) and want to keep it visual in the date_time attribute (even if the information it is not lost). Looking forward to hearing from you.

  • FBTFBT Member Posts: 106 Unicorn

    I tried the old example file from the original post, but could not replicate the problem in RM 7.5, as milliseconds are displayed correctly after using the import wizard. Could you provide one line of your data with a timestamp?

     

    Alternatively, you could try the "Format Numbers" operator with format type "Pattern", but you may need to change the datatype of your timestamp first.

  • renewrenew Member Posts: 3 Contributor I

    Thanks for the fast reply.

    One example timestamp is "2017-07-22T09:10:11.1230000Z". With the Import Wizard / Nomial to Date (RM7.5) no suitable date format works. If I define "yyyy-MM-dd'T'HH:mm:ss.SSS" it looks like "Jul 22, 2017 09:10:11 AM CEST", but many of the new timestamps are calculated wrong.

    It is important to work with the date_time format, because I need it for filtering and labeling the data in an following step. 

  • FBTFBT Member Posts: 106 Unicorn

    Ok, I am starting to believe that this could be indeed a bug (or a feature). The format is just not displayed in the corresponding date_time pattern, once the value is converted to type date_time. However, all information is available and you can actually apply filters that take the micro information (e.g. ms) that is not displayed into account.

     

    Rapidminer apparently always shows the format you describe when data is of the date_time type. I also tried the "Generate Attribute" operator and made some re-formatting (text transformation and conversion functions) prior to type conversion, as I thought that the 7 digits for the milliseconds could cause the issue. However, they don't. If you need it solely for the "visibility", one not very elegant way would be to duplicate the attribute. Then you can use the date_type version for your processing needs and convert the duplicate to nominal, which then shows the format you want.

     

    With only the one example, I could not verify that the "Nominal to Date" operator is calculating wrong timestamps, which would be surprising. However, I believe this is something that can be solved easily. Maybe you want to take a look at the conversion functions in the "Generate Attribute" operator. I am working a lot with high precision timestamps and never encountered "wrong" timestamps, and if so it was always human error.

  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Indeed,

    The format "Jul 22, 2017 9:10:11 AM CEST" is the RapidMiner visual representation of the value type date_time.

    Internally those values are kept as milliseconds which can be used for filtering. So if you want to see the milliseconds within your date format you need to convert your date attribute to Nominal.

     

    Regarding the point of wrongly calculated timestamps:

    Please have a look at the Parameters "time zone" and "locale" of the Operator Nominal to Date. I assume that those help you.

     

    Just as a sidenote:

    Generate Attributes has the built-in function date_millis(<your date attribute>) to display your date attribute in milliseconds starting from 01/01/1970.

     

    Best,

    Edin

     

Sign In or Register to comment.