Options

Replace missing value with many subgroup

XiaoHui_0206XiaoHui_0206 Member Posts: 2 Newbie
edited September 2023 in Help
Hello! ;)

I'm a new user of RapidMiner and I've encountered an issue while working with some packages. Specifically, I'm trying to replace missing values in my data with the average of the values within the same attribute, but grouped by another attribute. I'd appreciate any assistance in solving this problem. For example, i have

Countries Year Value
Malaysia  2015  1
Malaysia  2014  2
Malaysia  2013  3
Malaysia  2012  4
Malaysia  2011  ?
Malaysia  2010  ?
Malaysia  2009  7
Malaysia  2008  ?
Malaysia  2007  8
Malaysia  2006  9
Malaysia  2005  10
Malaysia  2004 ?

Indonesia 2015 1
Indonesia 2014 2
Indonesia 2013 3
Indonesia 2012 ?
Indonesia 2011  5
Indonesia 2010 6
Indonesia 2009 7
Indonesia 2008 ?
Indonesia 2007 8
Indonesia 2006 9
Indonesia 2005 10
Indonesia 2004 ?

I want to find the average of all countries, I have 190+ countries, but when I use the replace missing value operator, it divides the value by all countries' values, which is not accurate. How I can find the average of all countries by only dividing the particular countries? 
Exp:
Malaysia =(1+2+3+4+7+8+9+10)/8

Here is my dataset, thanks for helping me!  :)

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,

    my first intuition would be to use:
    Group into Collection by country
    Loop Collection
    Replace Missing Values inside it

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    XiaoHui_0206XiaoHui_0206 Member Posts: 2 Newbie
    Hi, can I know how to group by country by using loop collection? :'( After I drag the loop collection into the process. I can't connect my dataset output to the loop collection. There is some error. (Expected IOObjectCollection but received ExampleSet.) And inside the loop collection, I put replace missing value operator, what this operator should connect to?
  • Options
    ceaperezceaperez Member Posts: 522 Unicorn
    Hi @XiaoHui_0206,

    Other option is using the loop values operator, filtering by attribute, extracting the average of this attribute and finally replacing the average in each group.

    Please find attached a simple example, 

    Best, 

    Cesar
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    you need to use group into collection first. Its an operator in operator toolbox extension.

    @ceaperez your solution works, but it gets slow if you have large data sets with many nominals. Just because you have to filter every time.
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    ceaperezceaperez Member Posts: 522 Unicorn
    Good point @MartinLiebig,


Sign In or Register to comment.