Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"How to impute missing values"

ccapraccapra Member Posts: 6 Contributor II
edited June 2019 in Help
I have a survey dataset.

The survey design allows people to enter information about a single event more than once without repeating some details such as the host's contact info, and the event name & date.

This creates rows where some columns have missing data where the missing data is essentially the same data as in the same column's previous row.

Like this:

Sally    Smith         sally.smith@email.com     Jan 1 2012    Special Event      Downtown            One cool thing about the event
Joe      Shchoe      joe.sch@email.com         Feb 2 2012     Dumb Event        Riverside               One cool thing about the event
                                                                                                                                                     Another cool thing about the event
                                                                                                                                                     Joe had a lot to say about this event
Betty   Boop          betty.boop@email.com   Jan 5 2012      Odd Event        Out in the Boonies   One mildly cool thing about the event

********

So as you can see - Joe Schloe entered 3 rows of data & only had to put in redundant info once - and now I need to impute the value of the missing cells to the data above.  (i.e. copy Joe's contact & event data from the second row into the third & fourth rows.

I'm very new to RM & have only used some simple operators and never worked with either a subprocess nor with a 'learner' - but I think I need to use the 'impute missing values' process here - is that right?

And if so - how do I proceed?  (and - I don't know how it's supposed to look, but when I go into the impute missing values operator, there is nothing 'inside' it - I sorta thought it would have the subprocesses contained within, but it does not - so, am I just mis-expecting, or is there something wrong with my program?

Or - should I create a macro?  something like 'if a cell is empty, copy the cell from above'?  If so, how would I do that?

Thanks!
   





Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    If you always want to replace missing values with the first non-empty row above the current row you can follow these steps:
    1. install the Series Extension from the marketplace (Tools -> Updates and Extensions)
    2. use the operator Replace Missing Values (Series) with replacement set to "previous value"

    Best regards,
    Marius
Sign In or Register to comment.