Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"multicollinearity in linear regression"
Sorry if I post to a wrong forum and please let me know a proper place for my question.
I am trying to get a linear regression model with month of year which is a nominal of 12 values and other numeric attributes to predict revenue. I applied a nominal-to-binominal operator and a nominal-to-numeric operator before the linear regression operator. However the resulted model included all the 12 dummy variables (resulted from the first two conversion operator) and an intercept. As the sum of all the dummy variables are always one, there will be multicollinearity in the resulted model. Why not dropping one of the dummy variable in the process automatically? or it is the user's responsibility to drop it? But how?
Thanks in advance.
-Xiaoyan
I am trying to get a linear regression model with month of year which is a nominal of 12 values and other numeric attributes to predict revenue. I applied a nominal-to-binominal operator and a nominal-to-numeric operator before the linear regression operator. However the resulted model included all the 12 dummy variables (resulted from the first two conversion operator) and an intercept. As the sum of all the dummy variables are always one, there will be multicollinearity in the resulted model. Why not dropping one of the dummy variable in the process automatically? or it is the user's responsibility to drop it? But how?
Thanks in advance.
-Xiaoyan
Tagged:
0
Answers
how should it be possible for the computer to guess that you want to exclude one of your attributes from the model building process? Only based on the assumption: "Well, he has an attribute having tweleve values. Hey, that's probably a date and since he applies a linear regression he's of the financial field! Then it's certain to silently remove an input attribute automatically..."
No, this of course it is in the responsibility of the user to remove attributes for the model building he doesn't want to have regarded. There are two ways possible:
Simply filter the attributes away by applying a Select Attributes Operator beforehand or define their role to be special. All special roles aren't regarded as input for the analysis. Predefined special roles might have a special meaning (like label) but you can define your own roles in the "Set Role" operator by simply entering it.
I would suggest taking a deep look into the english manual, where all these basic ideas of RapidMiner are explained.
Greetings,
Sebastian