Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Calculating average with missing values"

iesnaolaiesnaola Member Posts: 8 Contributor II
edited June 2019 in Help

Hi everyone,

 

Having a Dataset like the following one:

 

row - Att1 - Att2 - Att3

(1)     5        7         5     

(2)     6        8         9     

(3)               7         6

(4)               7         

 

I want to calculate the average so that result would be the following one:

row - Att1 - Att2 - Att3 - Average

(1)     5        7         5         5.6

(2)     6        8         9         7.6

(3)               7         6         6.5

(4)               7                    7

 

I am using operator Generate Attributes, with the following function:

avg(Att1, Att2, Att3)

However, when any of the rows has a missing value, I simply get a Missing Value as Average.

 

I would like to know how to solve this problem.

 

Thanks in advance

         

Tagged:

Best Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist
    Solution Accepted

    Hi,

     

    the generate aggregation operator is able to do it.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    HI @iesnaola,

     

    You have to use the Generate Aggregation operator and check the parameter ignore missing

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Average_missing\Average_missing.xlsx"/>
    <parameter key="imported_cell_range" value="A1:C5"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Att1.true.integer.attribute"/>
    <parameter key="1" value="Att2.true.integer.attribute"/>
    <parameter key="2" value="Att3.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="8.0.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="447" y="34">
    <parameter key="attribute_name" value="average"/>
    <parameter key="aggregation_function" value="average"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards, 

     

    Lionel

     

Answers

Sign In or Register to comment.