RapidMiner

issue with date conversion

Elite

issue with date conversion

I'm trying to get the last 30 days of data for an imported csv, but the system tells me that my dateformats are incompatible. However, i can's see any real reason why or workaround, so any advice is welcome.

 

My process is as follows :

 

-> read csv, date is imported as nominal and is formatted as 'YYYY-MM-DD HH:mm:ss'

 

-> Nominal to date, date type = date_time, this gives me my date (entryTime) in following format on screen: Jan 7, 2017, 4:15:33 AM CET, but when I copy it it looks like this : Sat Jan 07 16:15:33 CET 2017

 

-> Next I create a macro called startdate, using following function : date_add(date_now(),-30,DATE_UNIT_DAY), which results in something like Jun 30, 2017 7:13:46 PM CEST

 

-> next I create a filter stating that my datefield needs to be bigger or equal than %{startdate} but it gives me an error stating :

'cannot instantiate custom filters: the filter value 'Jun 30, 7:20:30 PM CEST for the attribute  entryTime is not in a valid format. Formatting example: 12/31/2014 3:00:00 PM.

So I assume I need to change my dateformat somehow, but if I use date to nominal in the MM/dd/yyyy format and then back nominal to date I get the same problem. So what do I need to do, or are there other ways to filter dateranges?

 

 

 

2 REPLIES
Highlighted
RMStaff

Re: issue with date conversion

Hi,

check the SimpleDateFormat documentation:

https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

 

The DD is the number of days in the year, not the month.

HH is the actual number of hours (0-23), you should use that if you don't have AM/PM indicators.

 

Regards,

Balázs

--
Balázs Bárány
Data Scientist, Vienna
https://datascientist.at
RMStaff

Re: issue with date conversion

Hi,

 

The following XML codes provides 2 other possible solutions.

Both make use of Generate Attribute to generate a helper Attribute which is further used for filtering your dataset.

As a general remark: When working with dates I tend to use the conversion to milliseconds most of the time, since they are easier to follow.

 

The first example goes step-by-step converting to millliseconds, generating the filter macro & filter the relevant examples.

The second example checks if the date fulfills a certain condition (here:

if(date_after(date,date_add(date_now(),-30,DATE_UNIT_DAY)),true,false)

and returns a specified value which saves one Operator.

 

Best,

Edin

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data_user_specification" compatibility="7.5.003" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
        <list key="attribute_values">
          <parameter key="date" value="date_now()"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="generate_data_user_specification" compatibility="7.5.003" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="112" y="136">
        <list key="attribute_values">
          <parameter key="date" value="date_add(date_now(),-40,DATE_UNIT_DAY)"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="append" compatibility="7.5.003" expanded="true" height="103" name="Append" width="90" x="246" y="34"/>
      <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="380" y="34">
        <list key="function_descriptions">
          <parameter key="last30Days" value="date_millis(date)"/>
        </list>
      </operator>
      <operator activated="true" class="generate_macro" compatibility="7.5.003" expanded="true" height="82" name="Generate Macro" width="90" x="514" y="34">
        <list key="function_descriptions">
          <parameter key="todayMinus30" value="date_millis(date_add(date_now(),-30,DATE_UNIT_DAY))"/>
        </list>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples" width="90" x="648" y="34">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="last30Days.ge.%{todayMinus30}"/>
        </list>
      </operator>
      <operator activated="true" class="generate_data_user_specification" compatibility="7.5.003" expanded="true" height="68" name="Generate Data by User Specification (3)" width="90" x="112" y="289">
        <list key="attribute_values">
          <parameter key="date" value="date_now()"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="generate_data_user_specification" compatibility="7.5.003" expanded="true" height="68" name="Generate Data by User Specification (4)" width="90" x="112" y="391">
        <list key="attribute_values">
          <parameter key="date" value="date_add(date_now(),-40,DATE_UNIT_DAY)"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="append" compatibility="7.5.003" expanded="true" height="103" name="Append (2)" width="90" x="246" y="289"/>
      <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="380" y="289">
        <list key="function_descriptions">
          <parameter key="withinLast30Days" value="if(date_after(date,date_add(date_now(),-30,DATE_UNIT_DAY)),true,false)"/>
        </list>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.5.003" expanded="true" height="103" name="Filter Examples (2)" width="90" x="648" y="289">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="withinLast30Days.equals.true"/>
        </list>
      </operator>
      <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
      <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
      <connect from_op="Append" from_port="merged set" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Generate Macro" to_port="through 1"/>
      <connect from_op="Generate Macro" from_port="through 1" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
      <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append (2)" to_port="example set 1"/>
      <connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append (2)" to_port="example set 2"/>
      <connect from_op="Append (2)" from_port="merged set" to_op="Generate Attributes (4)" to_port="example set input"/>
      <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Filter Examples (2)" to_port="example set input"/>
      <connect from_op="Filter Examples (2)" from_port="example set output" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="231"/>
      <portSpacing port="sink_result 3" spacing="21"/>
    </process>
  </operator>
</process>