time series creation

data1mathsdata1maths Member Posts: 27 Contributor I
edited December 2018 in Help

Hello everyone, 

I have this date-time ettribut from which i want to create its appropriate time series counting the number of  events per hour, so that i can plot the evolution of the time series and apply arima on it later on.

ID	Date-time
1 31/12/2017 22:55
2 31/12/2017 22:27
3 31/12/2017 22:17
4 31/12/2017 21:55
5 31/12/2017 20:48
6 31/12/2017 11:03
7 31/12/2017 09:58
8 31/12/2017 09:52
9 31/12/2017 09:16
10 31/12/2017 08:11
11 31/12/2017 07:55
12 31/12/2017 06:54
13 31/12/2017 04:10
14 30/12/2017 12:15
15 30/12/2017 11:20
16 30/12/2017 10:50
17 30/12/2017 05:40
18 30/12/2017 04:49
19 30/12/2017 04:30
20 30/12/2017 04:00
21 30/12/2017 02:34
22 29/12/2017 23:02
23 29/12/2017 18:38
24 29/12/2017 08:04
25 29/12/2017 01:30
26 28/12/2017 23:59
27 28/12/2017 16:22
28 28/12/2017 14:43
29 28/12/2017 14:39
30 28/12/2017 14:39
31 28/12/2017 14:02
32 28/12/2017 13:46
33 28/12/2017 13:28
34 28/12/2017 12:51
35 28/12/2017 12:45
36 28/12/2017 12:24
37 28/12/2017 12:06
ETC

I do really need your help.

Thank you in advance.

Best regards

 

Tagged:

Best Answers

  • lionelderkrikorlionelderkrikor Posts: 757   Unicorn
    Solution Accepted

    Hi @data1maths,

     

    I have no idea how to perform your task with RapidMiner's native operator(s).

    So I propose a solution using a Python script : 

     

    Time_series_count.png

    To execute this process, you have to : 

     

     - Install Python on your computer

     - Install the Python Scripting extension from the MarketPlace

     - Set the name of your datetime attribute with quotes in the parameters of the Set Macros operator : 

    Time_series_count_2.pngikik

     

    The process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="9.0.002" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Python\time_serie_count_2.xlsx"/>
    <list key="annotations"/>
    <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Id.true.integer.attribute"/>
    <parameter key="1" value="Datetime.true.date_time.attribute"/>
    </list>
    <parameter key="read_not_matching_values_as_missings" value="false"/>
    </operator>
    <operator activated="true" class="set_macros" compatibility="9.0.002" expanded="true" height="82" name="Set Macros" width="90" x="246" y="34">
    <list key="macros">
    <parameter key="dateTime" value="'Datetime'"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="380" y="34">
    <list key="function_descriptions">
    <parameter key="count" value="1"/>
    </list>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="8.2.000" expanded="true" height="103" name="Execute Python" width="90" x="581" y="34">
    <parameter key="script" value="import pandas as pd&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;DT = %{dateTime}&#10;&#10;def rm_main(data):&#10;&#10; data[DT] = pd.to_datetime(data[DT])&#10; data = data.set_index(DT)&#10; data = data.resample('H', how='count')&#10; data = data.reset_index()&#10; &#10; #data.rm_metadata[&quot;Datetime&quot;] = (None,&quot;regular&quot;)&#10; &#10; # connect 2 output ports to see the results&#10; return data"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Set Macros" to_port="through 1"/>
    <connect from_op="Set Macros" from_port="through 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

  • mschmitzmschmitz Posts: 2,132  RM Data Scientist
    Solution Accepted

    Hi,

    something like this should do it.

     

    BR,

    Martin

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.5.000-SNAPSHOT" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
    <parameter key="generator_type" value="date_series"/>
    <parameter key="number_of_examples" value="3"/>
    <list key="function_descriptions"/>
    <list key="numeric_series_configuration"/>
    <list key="date_series_configuration">
    <parameter key="date" value="2017-12-31 22:00:00.2018-01-01 00:00:00"/>
    </list>
    <list key="date_series_configuration (interval)">
    <parameter key="date" value="2017-12-31 22:00:00.1.hour"/>
    </list>
    </operator>
    <connect from_op="Create ExampleSet" from_port="output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • tftemmetftemme Posts: 104  RM Research
    Solution Accepted

    Hi @data1maths,

     

    For creating timestamps for time series data the 'Create ExampleSet' operator from the toolbox extension is the way to go, as @mschmitz pointed out.

     

    When I understand you correctly, you have already timestamps and wants to count the events per hour, right?

     

    You can use Generate Attribute with date_get(timestamp,DATE_UNIT_HOUR) to extract the hour of your timestamp and then the Aggregate operator with the new hour_attribute as the group by attributes and count(id) to get the number of events in the corresponding hour. When you have events in every hour, you are already finshed. If not you should create hourly timestamps with the Create ExampleSet and joins this with your results and then Replace Missing Values with 0.

     

    Best regards,
    Fabian

     

     

  • IngoRMIngoRM Posts: 1,645  RM Founder
    Solution Accepted

    Hi,

     

    And last but not least, you can also get to the desired result with Turbo Prep.  In fact, there are multiple ways to achieve this.  Here is one I like personally:

     

    1. Import your data into a repository and open it in Turbo Prep.  Please note that I kept the date column in my example below as categorical (nominal) column.  You may need to adapt this a bit if you already have a proper date column.

     

    tp_timestamps1.pngLoad in timestamp data into Turbo Prep

     

    2. Copy the date column in Transform.  I called the new column "Hour".

     

    tp_timestamps2.pngCopy date column

     

    3. Change the type of the new Hour column to "date and time" using the "Change Type" function in the Transform group.  For the date format you show above, the correct date format string would be "dd/MM/yyyy HH:mm".  Don't forget to change the output type to "date and time" as well!

     

    tp_timestamps3.pngChange type to date

     

    4. Now change the type of the new column again, but this time to "number".  Extract the "hour" relative to the "epoch".  You now should have three columns: ID, the date / time column (still categorical), and a numerical column with all the hours (starting with a high number, but that should not bother you - if it does, you can change it later on.  This column will be useful for plotting though...)

     

    tp_timestamps4.pngExtract hour as number

     

    5. Now comes the cool trick and a bit of regular expression magic.  You select your original date column (you can do all of this in the same transformation session BTW) and use the "Replace" function.  You should activate the regular expressions and replace "(.*):.*" with "$1:00".  This will replace all minutes in this column by "00" which allows you to use these new values as a nice representation in your aggregation.

     

    tp_timestamps5.pngUse a regular expression to replace the minutes by "00"

     

    6. The final result should look like the table below.  Commit it.

     

    tp_timestamps6.pngFinal result of transformations - ready for the aggregation

     

    7. Now go into the "Pivot" group and use the date column (the one without the minutes) and the numerical hours column as "Group-By" columns.  Use the ID column as aggregate at the bottom - but change the function to "count".  This result is below.

     

    tp_timestamps7.pngCreate the Pivot table

     

    8. You can sort the data now according to the Hours column in the Transform group.  Then you can plot it to your heart's content.  Of course you can also build a process from this etc.

     

    tp_timestamps8.pngThe final chart as bars

     

    Hope this helps.  I will post the created process below as well.

     

    Best,

    Ingo

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process" origin="EXPORTED_TURBOPREP">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve" origin="EXPORTED_TURBOPREP" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Local Repository/timestamps"/>
    <description align="center" color="transparent" colored="false" width="126">Loading timestamps</description>
    </operator>
    <operator activated="true" class="subprocess" compatibility="9.0.002" expanded="true" height="82" name="Subprocess" origin="EXPORTED_TURBOPREP" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="nominal_to_text" compatibility="9.0.002" expanded="true" height="82" name="Nominal to Text" origin="EXPORTED_TURBOPREP" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    </operator>
    <operator activated="true" class="text_to_nominal" compatibility="9.0.002" expanded="true" height="82" name="Text to Nominal" origin="EXPORTED_TURBOPREP" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    </operator>
    <operator activated="true" class="numerical_to_real" compatibility="9.0.002" expanded="true" height="82" name="Numerical to Real" origin="EXPORTED_TURBOPREP" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    </operator>
    <connect from_port="in 1" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Text to Nominal" to_port="example set input"/>
    <connect from_op="Text to Nominal" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
    <connect from_op="Numerical to Real" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Unify column types</description>
    </operator>
    <operator activated="true" class="generate_copy" compatibility="9.0.002" expanded="true" height="82" name="Generate Copy" origin="EXPORTED_TURBOPREP" width="90" x="313" y="34">
    <parameter key="attribute_name" value="Date-time"/>
    <parameter key="new_name" value="Hour"/>
    <description align="center" color="transparent" colored="false" width="126">Copy Date-time to Hour</description>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="9.0.002" expanded="true" height="82" name="Loop Attributes" origin="EXPORTED_TURBOPREP" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="\QHour\E"/>
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="nominal_to_date" compatibility="9.0.002" expanded="true" height="82" name="Nominal to Date" origin="EXPORTED_TURBOPREP" width="90" x="45" y="34">
    <parameter key="attribute_name" value="%{loop_attribute}"/>
    <parameter key="date_type" value="2"/>
    <parameter key="date_format" value="dd/MM/yyyy HH:mm"/>
    </operator>
    <connect from_port="input 1" to_op="Nominal to Date" to_port="example set input"/>
    <connect from_op="Nominal to Date" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Parse dates in Hour</description>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="9.0.002" expanded="true" height="82" name="Loop Attributes (2)" origin="EXPORTED_TURBOPREP" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="\QHour\E"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="date_to_numerical" compatibility="9.0.002" expanded="true" height="82" name="Date to Numerical" origin="EXPORTED_TURBOPREP" width="90" x="45" y="34">
    <parameter key="attribute_name" value="%{loop_attribute}"/>
    <parameter key="time_unit" value="3"/>
    <parameter key="hour_relative_to" value="1"/>
    </operator>
    <connect from_port="input 1" to_op="Date to Numerical" to_port="example set input"/>
    <connect from_op="Date to Numerical" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Extract hour of epoch for Hour</description>
    </operator>
    <operator activated="true" class="replace" compatibility="9.0.002" expanded="true" height="82" name="Replace" origin="EXPORTED_TURBOPREP" width="90" x="715" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="\QDate-time\E"/>
    <parameter key="replace_what" value="(.*):.*"/>
    <parameter key="replace_by" value="$1:00"/>
    <description align="center" color="transparent" colored="false" width="126">Replace '(.*):.*' by '$1:00' in Date-time</description>
    </operator>
    <operator activated="true" class="aggregate" compatibility="9.0.002" expanded="true" height="82" name="Aggregate" origin="EXPORTED_TURBOPREP" width="90" x="849" y="34">
    <list key="aggregation_attributes">
    <parameter key="ID" value="count"/>
    </list>
    <parameter key="group_by_attributes" value="Date-time|Hour"/>
    <description align="center" color="transparent" colored="false" width="126">Aggregate ID grouped by Date-time, Hour</description>
    </operator>
    <operator activated="true" class="order_attributes" compatibility="9.0.002" expanded="true" height="82" name="Reorder Attributes" origin="EXPORTED_TURBOPREP" width="90" x="983" y="34">
    <parameter key="attribute_ordering" value="Date-time|Hour|count(ID)"/>
    <description align="center" color="transparent" colored="false" width="126">Ensure order of columns matches the table view.</description>
    </operator>
    <connect from_op="Retrieve" from_port="output" to_op="Subprocess" to_port="in 1"/>
    <connect from_op="Subprocess" from_port="out 1" to_op="Generate Copy" to_port="example set input"/>
    <connect from_op="Generate Copy" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_op="Loop Attributes (2)" to_port="input 1"/>
    <connect from_op="Loop Attributes (2)" from_port="output 1" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
    <connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,132  RM Data Scientist

    Hi @data1maths,

    have a look at "Create Example Set" in operator toolbox. One of the examples shows a similar way of using it.

     

    BR,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzerdata1maths
  • data1mathsdata1maths Member Posts: 27 Contributor I

    Hi  mschmitz ,

     

    Honestly i couldn't find what i needed in the exemples. 

     

    What i want is to get like for the time between 22h 23h of the 31/12/2017 it gets me 3 as the number of records in this hour.

     

    Thank you sir for your help.

    Best regards

  • data1mathsdata1maths Member Posts: 27 Contributor I

    Thank you so much to all of you. 

    Best Regards

     

Sign In or Register to comment.