How to count daily occurrences?

pusercpuserc Member Posts: 6 Contributor I
edited December 2018 in Help

I have a data source where each row consists of a id  and a date . How do I get the  amount of ids  per day; so that i can work on this time serie.

Thank you




  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @puserc - so without the ability to look at your data and your process (did you read the instructions when you were posting this message? :) ), it is hard to say exactly. It sounds like you need to do a simple Aggregate by day. But you may need to add a new feature if you have more than one timestamp per day. Can you please post the data and XML?



  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hi @puserc,


    Like @sgenzer said, you need a simple aggregation.


    However, I assume you have datetimes somewhere, so here is an example for you, a process that generates the day, month and year separately, creates an aggregation filtering by these columns, generating dates from days, months and years, and selecting only the required fields. The only "weird" thing I used was the Create ExampleSet operator that is included in the Operator Toolbox extension of RapidMiner, but you can get rid of it and connect your data.



    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
    <parameter key="generator_type" value="date_series"/>
    <parameter key="number_of_examples" value="10000"/>
    <parameter key="use_stepsize" value="true"/>
    <list key="function_descriptions"/>
    <list key="numeric_series_configuration"/>
    <list key="date_series_configuration"/>
    <list key="date_series_configuration (interval)">
    <parameter key="current_date" value="2017-01-01 00:00:00.198.second"/>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="year" value="date_get(current_date, DATE_UNIT_YEAR)"/>
    <parameter key="month" value="date_get(current_date, DATE_UNIT_MONTH)"/>
    <parameter key="day" value="date_get(current_date, DATE_UNIT_DAY)"/>
    <parameter key="current_date" value="current_date"/>
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
    <list key="aggregation_attributes">
    <parameter key="current_date" value="count"/>
    <parameter key="group_by_attributes" value="day|month|year"/>
    <parameter key="count_all_combinations" value="true"/>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="447" y="34">
    <list key="function_descriptions">
    <parameter key="the_date_for_this" value="date_parse_custom(concat(str(year), &quot;-&quot;, str(month), &quot;-&quot;, str(day)), &quot;yyyy-MM-dd&quot;, &quot;us&quot;)"/>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="count(current_date)|current_date"/>
    <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>

    That way you'll have a lot to work with.


    All the best,


Sign In or Register to comment.