Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Replace missing value with many subgroup

XiaoHui_0206XiaoHui_0206 Member Posts: 2 Learner I
edited September 2023 in Help
Hello! ;)

I'm a new user of RapidMiner and I've encountered an issue while working with some packages. Specifically, I'm trying to replace missing values in my data with the average of the values within the same attribute, but grouped by another attribute. I'd appreciate any assistance in solving this problem. For example, i have

Countries Year Value
Malaysia  2015  1
Malaysia  2014  2
Malaysia  2013  3
Malaysia  2012  4
Malaysia  2011  ?
Malaysia  2010  ?
Malaysia  2009  7
Malaysia  2008  ?
Malaysia  2007  8
Malaysia  2006  9
Malaysia  2005  10
Malaysia  2004 ?

Indonesia 2015 1
Indonesia 2014 2
Indonesia 2013 3
Indonesia 2012 ?
Indonesia 2011  5
Indonesia 2010 6
Indonesia 2009 7
Indonesia 2008 ?
Indonesia 2007 8
Indonesia 2006 9
Indonesia 2005 10
Indonesia 2004 ?

I want to find the average of all countries, I have 190+ countries, but when I use the replace missing value operator, it divides the value by all countries' values, which is not accurate. How I can find the average of all countries by only dividing the particular countries? 
Exp:
Malaysia =(1+2+3+4+7+8+9+10)/8

Here is my dataset, thanks for helping me!  :)

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    Hi,

    my first intuition would be to use:
    Group into Collection by country
    Loop Collection
    Replace Missing Values inside it

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • XiaoHui_0206XiaoHui_0206 Member Posts: 2 Learner I
    Hi, can I know how to group by country by using loop collection? :'( After I drag the loop collection into the process. I can't connect my dataset output to the loop collection. There is some error. (Expected IOObjectCollection but received ExampleSet.) And inside the loop collection, I put replace missing value operator, what this operator should connect to?
  • ceaperezceaperez Member Posts: 541 Unicorn
    Hi @XiaoHui_0206,

    Other option is using the loop values operator, filtering by attribute, extracting the average of this attribute and finally replacing the average in each group.

    Please find attached a simple example, 

    Best, 

    Cesar
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    you need to use group into collection first. Its an operator in operator toolbox extension.

    @ceaperez your solution works, but it gets slow if you have large data sets with many nominals. Just because you have to filter every time.
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ceaperezceaperez Member Posts: 541 Unicorn
    Good point @MartinLiebig,


Sign In or Register to comment.