as part of a scientific project I have to develop a data preprocessing model for the university. Currently I am struggling with missing values.
I have a data set with exclusively numerical attributes, in which numerous values are missing. Now I would like to implement the following in RM:
- for each attribute I would like to use 2-3 different methods (e.g. linear interpolation, quadratic interpolation, cubic interpolation, kNN algorithm; other algorithms which can used to impute missing values are also welcome) to replace the missing values with statistically calculated values.
- Then I want to calculate the performance of each method for each attribute and at the end select the best method for imputing missing values for each attribute.
It would be great if someone could help me.
Many thanks in advance