The Altair Community and the RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

time series: create weakly, monthly, yearly averages on time series

vkoutsafvkoutsaf Member Posts: 3 Contributor I
edited November 2018 in Help
I have a number to financial timeries
Some are daily, some are weekly, some are monthly etc
I want to convert them into the same frequency by averaging ie all values inside a month 
ie if X1: has daily values for the previous 10 years, Date:the corresponding index
I want to group all the values of the Date index by month and replace them with a new variable that has one value ie the average of all dates corresponding to that month
Any suggestions how one could do that?





Tagged:

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Check out either "Moving Average Filter" or "Extract Aggregates" operator in the Time Series folder under modeling (in RM Studio v9), which are designed to do what you are requesting when the series has consistent intervals.  There is a helpful tutorial process that shows how it works on series data.

    However, if you have different series data frequency, you can do this the conventional way inside RapidMiner by simply using Date to Numerical, for example, extracting the month and year (separately) from your date/time stamps, and then use the regular Aggregate operator to get your results grouped by month and year.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    vkoutsafvkoutsaf Member Posts: 3 Contributor I
    I am sorry ..I tried to experiment with the Extract Aggregates function.  To be frank, unless I am missing the point, it is very unsatisfactory..it lacks 'understanding' of time component..... instead it naively operates on groups of rows, and assumes u make them non-overlapping ... if I could pass some feedback ,... in certain industries that rely on analysis for performing very fast, on-the-fly analyses in real time ... one needs more dedicated components ... at work I have access to a product (no reason to name it) where I will specify a number of time series of different frequencies and it has built in intelligence to align them by highest, lowest, or other frequency and prompts for a method of alllignement of the series ....if X1:daily and X2:qtly, aggregate X1 on a qtly basis -->X1* and then display X1* vs X2 for more complex analysis... this is not supposed to be criticism, just (hopefully) useful feedback



  • Options
    tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    edited December 2018
    Hi @vkoutsaf ,

    Thanks for the feedback about the time series extension. It is always good to hear feedback. You are right the current time series operators are not yet directly designed to handle time series of different frequencies or non-equidistant time series. This is planned for upcoming releases.

    I want to suggest also another solution. Performing an outer join on time series with different frequencies, results in a joined data set with missing values for the timestamps for which the corresponding attributes don't have values in the original data sets. You can then use Replace Missing Values (Series) to interpolate these and have a resulting dataset with a single frequency.

    Here is a demo process, how to do this:

    <process version="9.0.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="9.0.003" expanded="true" height="103" name="Subprocess" width="90" x="45" y="85">
            <process expanded="true">
              <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
                <parameter key="number_of_examples" value="104"/>
                <list key="function_descriptions">
                  <parameter key="time" value="date_add(date_parse(&quot;01/01/2000&quot;),id-1,DATE_UNIT_WEEK)"/>
                  <parameter key="weekly data" value="10*rand()"/>
                </list>
                <parameter key="add_id_attribute" value="true"/>
                <list key="numeric_series_configuration"/>
                <list key="date_series_configuration"/>
                <list key="date_series_configuration (interval)"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="9.0.003" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
                <parameter key="attribute_name" value="time"/>
                <parameter key="target_role" value="id"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="id"/>
                <parameter key="invert_selection" value="true"/>
              </operator>
              <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="136">
                <parameter key="number_of_examples" value="24"/>
                <list key="function_descriptions">
                  <parameter key="time" value="date_add(date_parse(&quot;01/01/2000&quot;),id-1,DATE_UNIT_MONTH)"/>
                  <parameter key="monthly data" value="100*rand()"/>
                </list>
                <parameter key="add_id_attribute" value="true"/>
                <list key="numeric_series_configuration"/>
                <list key="date_series_configuration"/>
                <list key="date_series_configuration (interval)"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="9.0.003" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="136">
                <parameter key="attribute_name" value="time"/>
                <parameter key="target_role" value="id"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="136">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="id"/>
                <parameter key="invert_selection" value="true"/>
              </operator>
              <connect from_op="Create ExampleSet" from_port="output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_port="out 1"/>
              <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
              <connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
              <connect from_op="Select Attributes (2)" from_port="example set output" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Creates two ExampleSets, one with weekly data, one with monthly data</description>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="flag weekly data" value="&quot;true&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.0.003" expanded="true" height="82" name="Join" width="90" x="313" y="85">
            <parameter key="join_type" value="outer"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
          </operator>
          <operator activated="true" class="sort" compatibility="9.0.003" expanded="true" height="82" name="Sort" width="90" x="447" y="85">
            <parameter key="attribute_name" value="time"/>
          </operator>
          <operator activated="true" class="time_series:replace_missing_values" compatibility="9.1.000-BETA" expanded="true" height="68" name="Replace Missing Values (Series)" width="90" x="648" y="85">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="weekly data|monthly data"/>
            <parameter key="has_indices" value="true"/>
            <parameter key="indices_attribute" value="time"/>
            <parameter key="replace_type_numerical" value="linear interpolation"/>
            <parameter key="ensure_finite_values" value="true"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.0.003" expanded="true" height="103" name="Filter Examples" width="90" x="782" y="85">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="flag weekly data.is_not_missing."/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes (3)" width="90" x="916" y="85">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="original weekly data"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Subprocess" from_port="out 2" to_op="Join" to_port="right"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_op="Sort" to_port="example set input"/>
          <connect from_op="Sort" from_port="example set output" to_op="Replace Missing Values (Series)" to_port="example set"/>
          <connect from_op="Replace Missing Values (Series)" from_port="example set" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
          <connect from_op="Select Attributes (3)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <description align="center" color="transparent" colored="true" height="67" resized="false" width="126" x="158" y="139">Generate a flag attribute to used later in the filter</description>
          <description align="center" color="transparent" colored="true" height="229" resized="false" width="180" x="332" y="186">Perform an outer join using the id attribute (which is the time attribute).&lt;br&gt;&lt;br&gt;The resulting ExampleSet will have missing values for the timestamps which don't occur in the original datasets&lt;br/&gt;&lt;br/&gt;Use sort to sort again for the joint timestemps</description>
          <description align="center" color="transparent" colored="true" height="216" resized="true" width="286" x="674" y="199">Use Replace Missing Values with linear interpolation to fill the missing values. &lt;br/&gt;&lt;br/&gt;Use Filter Examples with the flag attribute is not missing to reduce again to weekly timestamps.&lt;br/&gt;&lt;br/&gt;Deselect the flag attribute. Now you have one ExampleSet with weekly data for both time series attributes.</description>
        </process>
      </operator>
    </process>


    Best regards,
    Fabian

Sign In or Register to comment.