RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
Why do we need to normalise data and group them together?
Hello fellow practitioners,
I have a statistic question and hopefully, someone can explain to me.
I am trying to solve a linear regression problem and trying to impute missing values. This is a setup done by my professor and we are required to find out the intent of his setup.
This is his setup, Impute Missing Values -> Optimize Parameters (Grid) -> Cross Validation
According to my understanding, this setup is essentially trying to use k-NN to locate k nearest data and then create a value to fill the missing columns. I do not understand is why do we need to normalize the data first then pass the preprocessing model together with the output of k-NN into Group Models operator? I believe the same goal can be achieved without both Normalize and Group Models operator, right?
Or is it trying to obtain the best k-value?