Moving Average operator behavior / settings

kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn
edited December 2018 in Product Feedback - Resolved

Hi, I have come over unexpected behavior of Moving Average operator from series extension. 

 

By default, it creates a new attribute which is the result of moving average (or another chosen function) calculation. With settings like this, for example:

 

Screenshot 2018-08-23 10.51.57.png

it creates new attribute with name 'average(sum7)'. 

 

Fact is, that this default name prevents me from chaining this operator multiple times, for example if I needed to calculate both 7-days and 30-days moving average, this process won't work:

 

Screenshot 2018-08-23 10.51.00.png

because second Moving Average 30 tries to create a new attribute with the exactly same name as the first one (Moving Average 7) already has created.

 

I have either to multiply initial attribute which I am aggregating in order to get its copy under another name, or rename a new one after first Moving Average. Not critical, but still one excessive step in the process. 

 

Is there anything that prevents having a setting that would allow us to choose the default name of an attribute created, and not having it named in a default way that cannot be changed?

 

Thanks. 

 

 

Tagged:
0
0 votes

Fixed and Released · Last Updated

Resolved with 9.3. Please post new comments if needed.

Comments

  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research

    Hi @kypexin

     

    Concerning the operator from the series extension I unfortunately don't know if we can add your supposed functionality, I suggest to open this for voting. But if you only want to calculate the moving average (so no other aggregation function), you may want to have a look at the Moving Average Filter operator from the new time series extension (which is bundled with the core since 9.0). Not only that you can apply the moving average (select simple filter type) on multi attributes at once, it also gives you a parameter with a default prefix for the new attributes.

     

    Best regards and hopes this helps

    Fabian

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    HI @tftemme

     

    Thanks for answering! I didn't know this has become a part of RM core, though seems that this filter has slightly less settings and possibilities compared to an operator from extension. I will try to evaluate, however.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research

    @kypexin,

     

    Yes the new time series operators are still in development and are not yet replacing the old series extension. I would also love to hear what functionality you are missing. In fact I am developing the new time series operators and enjoy getting feedback ;-)

     

    Best regards
    Fabian

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @tftemme

     

    I would say, 'result position' setting from old version of operator is a useful thing, because it gives flexibility for different business tasks. 

    'Aggregation function' is interesting, but I haven't come across any use case where I would need anything other than 'average' function.

     

    Lastly, I am concerned with difference produced by both versions of an operator. I have compared 30-days moving averages on the same dataset an dthe results are different.

     

    For some reason this part of forum does not allow to post photos (??). So I will send you a DM instead. 

     

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    @kypexin,

    there are quite some scenarios where i prefer mode over average, because it's good against outliers.

     

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research

    Hi @kypexin,

     

    I will answer in this thread again, so that also other can read it. Strange that you cannot post photos. @sgenzer any idea?

     

    Yeah the result position and aggregation function we could consider.

     

    Concerning the differences between the two operators. I think the parameters of the new operator are maybe a bit misleading. The filter size is not the same as the window with parameter, but rather the window size of the new operator is 2*filter size + 1, to ensure that it is symmetric. With a symmetric filter you have a clear defined middle position, which I use to put the result at this position. I think I should loosen this condition and change it. So the reason that the "new" moving average for yourself is more flatten is just that it is a larger filter. When you compare for example a 7 window width Moving Average (series) and a 3 filter size Moving Average Filter (time_series) you see that the values calculated are the same. Also I realized that the position of the calculated moving average (series) is not at the center, even if center is selected. Seems to be a bug there too. So the results are shifted by one Example.

     

    The reason that the Moving Average Filter (time_series) does not reach the end of a series is, that the values are calculated for the center position and they are not defined at the beginning and end of a series (cause then the window is reaching out of the range of the series).

     

    Here is the process I used to compare both operators.

     

    <process version="9.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.0.001" expanded="true" height="68" name="Retrieve Lake Huron" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/Time Series/data sets/Lake Huron"/>
    </operator>
    <operator activated="true" class="time_series:moving_average_filter" compatibility="9.0.002-SNAPSHOT" expanded="true" height="68" name="Moving Average Filter" width="90" x="246" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Lake surface level / feet"/>
    <parameter key="overwrite_attributes" value="false"/>
    <parameter key="filter_size" value="3"/>
    </operator>
    <operator activated="true" class="series:moving_average" compatibility="7.4.000" expanded="true" height="82" name="Moving Average" width="90" x="380" y="85">
    <parameter key="attribute_name" value="Lake surface level / feet"/>
    <parameter key="window_width" value="7"/>
    <parameter key="result_position" value="center"/>
    </operator>
    <connect from_op="Retrieve Lake Huron" from_port="output" to_op="Moving Average Filter" to_port="example set"/>
    <connect from_op="Moving Average Filter" from_port="example set" to_op="Moving Average" to_port="example set input"/>
    <connect from_op="Moving Average" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.