The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

how to count events in time intervals. input is vector of event times

owenowen Member Posts: 22 Contributor II
edited November 2018 in Help
Hello all,

The question says it all. How can RM pre-processing be arranged to solve the following counting problem. Suppose events occur at times 0.7, 2.5, 2.6, 3.9. I want to count the events in uniformly spaced intervals. For examples, if the intervals are [0,1), [1,2), [2,3), [3,4) then the events would fall into groups (0.7)()(2.5,2.6)(3.9) and the counts for this data would be 1,0,2,1. So the output is a timeseries  1, 0, 2, 1.

I have got as far as creating an input file and reading it into RM with a CSV reader.
event_times.txt =======
0.7
2.5
2.6
3.9
=============
There are fancy operators to create a Markup via value dimension, but I think I want to create Markup uniformly on the displacement dimension (time is a displacement dimension, yes?). And when that is done, how do I use the Markup to obtain count data?
I have the ValueSeries plugin, RM5.3. Also, when I import, the data, what role should I choose for the event times? id?
Do I need to augment the event times with some trivial value data like
0.7 1
2.5 1
2.6 1
3.9 1
I appreciate any suggestions or examples you may have.
Thank you,
Owen

Answers

  • Options
    owenowen Member Posts: 22 Contributor II
    Hello all,

    I made some progress which is shown in the code below. First it generates a set of 80 random time arrivals within a time of 60 seconds. Then calculates an attribute interval_id that tells which 5-second interval contains each example. Then it counts the number of examples in each interval. You can compare the two output example-sets to see that the counting seems correct.

    Question. This code is a proof-of-concept. Is there a better way to do this? It seems there are several operators that hint at this functionality, especially in the Series plugin. For example
    • Windowing (Series; Intervals) (Series)
    • Transform Intervals (Series)
    • Windowing (Series).
    Code follows.

    Regards,

    Owen
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="5.3.015" expanded="true" height="76" name="Poisson Process" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="generate_data" compatibility="5.3.015" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
                <parameter key="number_examples" value="80"/>
                <parameter key="number_of_attributes" value="1"/>
                <parameter key="attributes_lower_bound" value="0.0"/>
                <parameter key="attributes_upper_bound" value="60.0"/>
              </operator>
              <operator activated="true" class="sort" compatibility="5.3.015" expanded="true" height="76" name="Sort" width="90" x="179" y="30">
                <parameter key="attribute_name" value="att1"/>
              </operator>
              <operator activated="true" class="exchange_roles" compatibility="5.3.015" expanded="true" height="76" name="Exchange Roles" width="90" x="315" y="30">
                <parameter key="first_attribute" value="label"/>
                <parameter key="second_attribute" value="att1"/>
              </operator>
              <operator activated="true" class="rename" compatibility="5.3.015" expanded="true" height="76" name="Rename" width="90" x="112" y="210">
                <parameter key="old_name" value="att1"/>
                <parameter key="new_name" value="time"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="remove_attribute_range" compatibility="5.3.015" expanded="true" height="76" name="Remove Attribute Range" width="90" x="246" y="210">
                <parameter key="first_attribute" value="1"/>
                <parameter key="last_attribute" value="1"/>
              </operator>
              <connect from_op="Generate Data" from_port="output" to_op="Sort" to_port="example set input"/>
              <connect from_op="Sort" from_port="example set output" to_op="Exchange Roles" to_port="example set input"/>
              <connect from_op="Exchange Roles" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Remove Attribute Range" to_port="example set input"/>
              <connect from_op="Remove Attribute Range" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes" width="90" x="180" y="30">
            <list key="function_descriptions">
              <parameter key="interval_id" value="floor(time/5.)"/>
            </list>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.3.015" expanded="true" height="76" name="Aggregate" width="90" x="313" y="30">
            <list key="aggregation_attributes">
              <parameter key="interval_id" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="|interval_id"/>
          </operator>
          <connect from_op="Poisson Process" from_port="out 1" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
          <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.