Hi! I used R for multiple imputation and imputed 5 Imputations of my data. For the Model, I am using a stacking model of 3 base learners.
I don`t know what I should do with these imputations of the data. Should I train all my base learners with all these imputations individually?
That sounds right, but it takes a lot of time to train each of the base learners with each of the imputed data sets and then again train the stacked model with each of the imputed data sets!
Anyway, if that`s right, how can I combine the five models learned by 5 imputed data sets?
I mean, for example, to combine models for a stacking model, or addaboost or ... there are operators, but to combine models built from different imputed data sets, I couldn`t find any operator!
yeah, Imputation creates a modified data-set with missing values replaced by imputed values.
I want my model to have a better performance when using the 5 imputations than when just one imputation is used.
For example, when you build several regression models from several imputations, there are rules to combine these regressions and extract one model out of them. But here that I have an ensemble model, I`m not sure what is the best way to combine them. Voting or any other way?
Voting gives better performance, but is it the common way to combine models built from different imputations.
This is my first experience with Rapid miner, How was the process overally?
Any suggestions on the whole process?
Here is a little explanation about them: 1. You need to have VIM package of R for being able to run it!
2. I upload two codes for you! In the first one I just imputed 1 dataset, and in the second one I imputed 5 datasets.
About the first code: Here, in the first Subprocess I trained 3 base learners and in the second subprocess I used these 3 learners for training a stacking model!
The stacking model has a better performance of all!
About the second code:Here in the first subprocess, I used 5 imputations to train 5 stacking models just like how I did in the first code! Then in the second subprocess I voted on these 5 models built by 5 imputations to combine the results to gain better performance!
I hope you don`t get confused with the process!
Any suggestions on the whole process would be welcomed!
I mean any other way to combine the results of the imputations instead of voting or ...!
In these processes I trained all the base learners with all the imputations, is that the common way?
Thanks in advance.
@ "This is my first experience with Rapid miner, How was the process overall?"
You should limit the size of your process, and rely less on recall operators.
The process I uploaded above showed how to embed multiple imputation operators within the stacking and X-validation operator .
Alternatively you can create a new data set that contains multiple copies for each imputed attribute.
- Load your data
- Generate ID's
- Apply imputation (5x)
- Join results into 1 final data set
About your first code, I don`t want to use just one model(Naive as you did) as the base learner. I want 3, and I wasn`t sure to train all these 3 with all the base learners. Besides, using stacking instead of voting (as you did) makes the process even more complicated.
About your second code, the Join operator is not a suitable way for joining your imputations. Because whenever there is a difference it just ignores the value of the right imputation. It`s just like using only the left imputation.
Anyway, thanks for your time!