Options

Difference of two dates?

SquirrelXSquirrelX Member Posts: 9 Contributor II
edited November 2018 in Help
I'm trying to filter the examples I have so that the date in one attribute is always smaller than in another. Basically, what I need is that "If date1 < date2 then keep the example, otherwise throw it away".

I can't seem to be able to do this very easily (but I suspect this should be an easy operation). Filter Examples doesn't seem to accept dates. When I convert my two attributes to integers, then Filter Examples complains that the left hand side attribute is not numerical (it is!) when I try to use the attribute_value_filter with "date1_day > date2_day". So I searched for something else and wanted to try the Generate Aggregates function so I can create a new attribute that's either larger than zero or not, but the function does only sums and such, whereas I would need to subtract one from the other number.

As I think I'm beginning to overcomplicate the solution I would appreciate if someone could help me out with some hints.

Thanks

Answers

  • Options
    SebastianLohSebastianLoh Member Posts: 99 Contributor II
    Hi SquirellX,

    yes you are right, it suppose to be an easy operation, unfortunatley it isn't until the next RM release (about mid december).

    But you've been on a very good track, so close to the solution:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="521" width="748">
          <operator activated="true" class="subprocess" compatibility="5.0.8" expanded="true" height="76" name="2 date attributes" width="90" x="45" y="75">
            <process expanded="true" height="510" width="829">
              <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
                <parameter key="number_examples" value="1000"/>
                <parameter key="number_of_attributes" value="2"/>
                <parameter key="attributes_lower_bound" value="0.0"/>
                <parameter key="attributes_upper_bound" value="1.35555856E11"/>
              </operator>
              <operator activated="true" class="numerical_to_date" compatibility="5.0.8" expanded="true" height="76" name="Numerical to Date" width="90" x="246" y="30">
                <parameter key="attribute_name" value="att1"/>
              </operator>
              <operator activated="true" class="numerical_to_date" compatibility="5.0.8" expanded="true" height="76" name="Numerical to Date (2)" width="90" x="380" y="30">
                <parameter key="attribute_name" value="att2"/>
              </operator>
              <connect from_op="Generate Data" from_port="output" to_op="Numerical to Date" to_port="example set input"/>
              <connect from_op="Numerical to Date" from_port="example set output" to_op="Numerical to Date (2)" to_port="example set input"/>
              <connect from_op="Numerical to Date (2)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="date_to_numerical" compatibility="5.0.8" expanded="true" height="76" name="Date to Numerical" width="90" x="179" y="75">
            <parameter key="attribute_name" value="att1"/>
            <parameter key="millisecond_relative_to" value="epoch"/>
            <parameter key="keep_old_attribute" value="true"/>
          </operator>
          <operator activated="true" class="date_to_numerical" compatibility="5.0.8" expanded="true" height="76" name="Date to Numerical (2)" width="90" x="313" y="75">
            <parameter key="attribute_name" value="att2"/>
            <parameter key="millisecond_relative_to" value="epoch"/>
            <parameter key="keep_old_attribute" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="75">
            <list key="function_descriptions">
              <parameter key="diff" value="att1_millisecond -att2_millisecond"/>
            </list>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples" width="90" x="648" y="75">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="diff &lt;0"/>
          </operator>
          <connect from_op="2 date attributes" from_port="out 1" to_op="Date to Numerical" to_port="example set input"/>
          <connect from_op="Date to Numerical" from_port="example set output" to_op="Date to Numerical (2)" to_port="example set input"/>
          <connect from_op="Date to Numerical (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="54"/>
          <portSpacing port="sink_result 2" spacing="18"/>
        </process>
      </operator>
    </process>
    In the next RM release the value filter can compare also date(att1) < date(att2) or similar operations.

    I hope I could help,

    Seabstian
  • Options
    SquirrelXSquirrelX Member Posts: 9 Contributor II
    Thanks Sebastian, it works. Though I'm looking forward to the next release  ;)
  • Options
    MBMMBM Member Posts: 23 Contributor I

    Hey, I also have a question regarding dates. 

    I have a list with user_ids and a user has multiple dates for example:

     

    ID     date
    12 Fri Feb 06 15:16:07 CET 2004
    12 Fri Feb 06 15:16:07 CET 2004
    12 Mon Feb 09 19:16:03 CET 2004
    12 Sat Feb 14 13:16:01 CET 2004
    19 Wed Mar 06 19:30:09 CET 2004
    19 Fri Feb 06 19:16:03 CET 2004

     

    What is the expression for something like:

    Look for ID. Count the first date for this ID till the last date for this ID and if the sum is more than 2, delete the data for this ID. And consinder that if the same date appears more than once for the ID take it only as one day. 12 would be deleted and only 19 would stay in the example. 

     

    With the current operator "Filter Examples" I found "condition class" to get "parameter expression" but I am not sure how to get the expression. 

     

    Has anyone an idea?

     

    Regards

    MBM

     

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,512 RM Data Scientist

    Hi MBM,

     

    i think you need to use quite some aggregation here.

     

    First aggregate and group by userID AND Date, delete everything which has less than 2 and use set minus to delete it from the orignal data set. That should satisfy condition 2.

     

    For the first condition: Is your data always sorted in time? In this case, you can aggregate min(date) and max(date), calculate date_diff and do the same filtering thing.

     

    Best. 

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MBMMBM Member Posts: 23 Contributor I

    Hey mschmitz, 

     

    I assume yes, my data should be sortet in time. I read an old thread here and I first sorted by date and after that by id. Now I have a huge list grouped by id and with the dates. I think your second suggestion makes sense. If I understand correctly I need the minimum date of an id and the maximum date of an id and then use date_diff to get the days. But how can I say "Give me to a certain id the minimum date and the maximum date"? For date_diff I need those two dates.

     

    Thanks in advance

     

    MBM

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,512 RM Data Scientist

    Hey MBM,

     

    Take aggregate and calculate min(date) and max(date) and group by id should do the job.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MBMMBM Member Posts: 23 Contributor I

    works fine =) 

     

    thank you!

Sign In or Register to comment.