"Is possibile (and correct) to replace missing values keeping the same distribution of values?"
Hi, I have some attributes with missing values and I want to find the best way to replace them.
Usually you can replace them with the "average" (or most frequent value) but is it possible in Rapid Miner (but more important, is it correct) to replace them by keeping the same distribution of the non-missing values?
I try to explain better with an example:
Let's say I have an attribute "Nationality" with this distribution of values:
I would like to replace the missing values with: 50% of values "ENG", 22% of values "ITA" and so on.
Note that I don't have other attributes which give me more knowledge about it and that I can use to better estimate the nationality.
What do you think? Do you have suggestion or better ways to do it?
Thank you in advance