Options

"Calculating average with missing values"

iesnaolaiesnaola Member Posts: 8 Contributor II
edited June 2019 in Help

Hi everyone,

 

Having a Dataset like the following one:

 

row - Att1 - Att2 - Att3

(1)     5        7         5     

(2)     6        8         9     

(3)               7         6

(4)               7         

 

I want to calculate the average so that result would be the following one:

row - Att1 - Att2 - Att3 - Average

(1)     5        7         5         5.6

(2)     6        8         9         7.6

(3)               7         6         6.5

(4)               7                    7

 

I am using operator Generate Attributes, with the following function:

avg(Att1, Att2, Att3)

However, when any of the rows has a missing value, I simply get a Missing Value as Average.

 

I would like to know how to solve this problem.

 

Thanks in advance

         

Tagged:

Best Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,508 RM Data Scientist
    Solution Accepted

    Hi,

     

    the generate aggregation operator is able to do it.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    HI @iesnaola,

     

    You have to use the Generate Aggregation operator and check the parameter ignore missing

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Average_missing\Average_missing.xlsx"/>
    <parameter key="imported_cell_range" value="A1:C5"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Att1.true.integer.attribute"/>
    <parameter key="1" value="Att2.true.integer.attribute"/>
    <parameter key="2" value="Att3.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="8.0.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="447" y="34">
    <parameter key="attribute_name" value="average"/>
    <parameter key="aggregation_function" value="average"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards, 

     

    Lionel

     

Answers

Sign In or Register to comment.