Options

Extract Aggregates operator : Error in functions calculation ?

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
edited September 2019 in Help
Hi RM Staff,

First I hope everyone is doing well.
Secondly, I think there is an error of calculation in the Extract Aggregates operator (Time-series module) for the : 
 - median
 - first quartile
 - third quartile
It seems that these 3 functions are assimiled to the "minimum" function...
Here the results for the "Temperature" attribute of the "Golf" dataset : 


These curious results allowed me to test the new function "percentile" of the Aggregate operator. This operator give (from my point of view)
the good following results : 


 The process (use RM 9.1 (beta) to run this process) : 

<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000-BETA2">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.1.000-BETA2" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.1.000-BETA2" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="85">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="time_series:extract_std_descriptive_features" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Extract Aggregates" width="90" x="380" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Temperature"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="add_time_series_name" value="false"/>
        <parameter key="sum" value="true"/>
        <parameter key="mean" value="true"/>
        <parameter key="geometric_mean" value="true"/>
        <parameter key="first_quartile" value="true"/>
        <parameter key="median" value="true"/>
        <parameter key="third_quartile" value="true"/>
        <parameter key="min" value="true"/>
        <parameter key="max" value="true"/>
        <parameter key="std_deviation" value="true"/>
        <parameter key="kurtosis" value="true"/>
        <parameter key="skewness" value="true"/>
        <parameter key="ignore_invalid_values" value="false"/>
      </operator>
      <operator activated="true" class="aggregate" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Aggregate" width="90" x="581" y="136">
        <parameter key="use_default_aggregation" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="default_aggregation_function" value="average"/>
        <list key="aggregation_attributes">
          <parameter key="Temperature" value="median"/>
          <parameter key="Temperature" value="percentile (25)"/>
          <parameter key="Temperature" value="percentile (75)"/>
          <parameter key="Temperature" value="average"/>
          <parameter key="Temperature" value="minimum"/>
        </list>
        <parameter key="group_by_attributes" value=""/>
        <parameter key="count_all_combinations" value="false"/>
        <parameter key="only_distinct" value="false"/>
        <parameter key="ignore_missings" value="true"/>
      </operator>
      <connect from_op="Retrieve Golf" from_port="output" to_op="Extract Aggregates" to_port="example set"/>
      <connect from_op="Extract Aggregates" from_port="features" to_port="result 1"/>
      <connect from_op="Extract Aggregates" from_port="original" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Regards,

Lionel

Tagged:

Best Answer

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Thanks for catching this!  I have also been testing the new time series operators in the RapidMiner 9.1 beta but I had not tried those specific aggregation functions yet and thus I had not observed the problem.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.