Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"How to impute missing values"
I have a survey dataset.
The survey design allows people to enter information about a single event more than once without repeating some details such as the host's contact info, and the event name & date.
This creates rows where some columns have missing data where the missing data is essentially the same data as in the same column's previous row.
Like this:
Sally Smith sally.smith@email.com Jan 1 2012 Special Event Downtown One cool thing about the event
Joe Shchoe joe.sch@email.com Feb 2 2012 Dumb Event Riverside One cool thing about the event
Another cool thing about the event
Joe had a lot to say about this event
Betty Boop betty.boop@email.com Jan 5 2012 Odd Event Out in the Boonies One mildly cool thing about the event
********
So as you can see - Joe Schloe entered 3 rows of data & only had to put in redundant info once - and now I need to impute the value of the missing cells to the data above. (i.e. copy Joe's contact & event data from the second row into the third & fourth rows.
I'm very new to RM & have only used some simple operators and never worked with either a subprocess nor with a 'learner' - but I think I need to use the 'impute missing values' process here - is that right?
And if so - how do I proceed? (and - I don't know how it's supposed to look, but when I go into the impute missing values operator, there is nothing 'inside' it - I sorta thought it would have the subprocesses contained within, but it does not - so, am I just mis-expecting, or is there something wrong with my program?
Or - should I create a macro? something like 'if a cell is empty, copy the cell from above'? If so, how would I do that?
Thanks!
The survey design allows people to enter information about a single event more than once without repeating some details such as the host's contact info, and the event name & date.
This creates rows where some columns have missing data where the missing data is essentially the same data as in the same column's previous row.
Like this:
Sally Smith sally.smith@email.com Jan 1 2012 Special Event Downtown One cool thing about the event
Joe Shchoe joe.sch@email.com Feb 2 2012 Dumb Event Riverside One cool thing about the event
Another cool thing about the event
Joe had a lot to say about this event
Betty Boop betty.boop@email.com Jan 5 2012 Odd Event Out in the Boonies One mildly cool thing about the event
********
So as you can see - Joe Schloe entered 3 rows of data & only had to put in redundant info once - and now I need to impute the value of the missing cells to the data above. (i.e. copy Joe's contact & event data from the second row into the third & fourth rows.
I'm very new to RM & have only used some simple operators and never worked with either a subprocess nor with a 'learner' - but I think I need to use the 'impute missing values' process here - is that right?
And if so - how do I proceed? (and - I don't know how it's supposed to look, but when I go into the impute missing values operator, there is nothing 'inside' it - I sorta thought it would have the subprocesses contained within, but it does not - so, am I just mis-expecting, or is there something wrong with my program?
Or - should I create a macro? something like 'if a cell is empty, copy the cell from above'? If so, how would I do that?
Thanks!
Tagged:
0
Answers
1. install the Series Extension from the marketplace (Tools -> Updates and Extensions)
2. use the operator Replace Missing Values (Series) with replacement set to "previous value"
Best regards,
Marius