"multicollinearity in linear regression"

xiaoyanyxiaoyany Member Posts: 2 Contributor I
edited May 2019 in Help
Sorry if I post to a wrong forum and please let me know a proper place for my question.
I am trying to get a linear regression model with month of year which is a nominal of 12 values and other numeric attributes to predict revenue. I applied a nominal-to-binominal operator and a nominal-to-numeric operator before the linear regression operator. However the resulted model included all the 12 dummy variables (resulted from the first two conversion operator) and an intercept. As the sum of all the dummy variables are always one, there will be multicollinearity in the resulted model. Why not dropping one of the dummy variable in the process automatically? or it is the user's responsibility to drop it? But how?

Thanks in advance.


  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    how should it be possible for the computer to guess that you want to exclude one of your attributes from the model building process? Only based on the assumption: "Well, he has an attribute having tweleve values. Hey, that's probably a date and since he applies a linear regression he's of the financial field! Then it's certain to silently remove an input attribute automatically..."
    No, this of course it is in the responsibility of the user to remove attributes for the model building he doesn't want to have regarded. There are two ways possible:
    Simply filter the attributes away by applying a Select Attributes Operator beforehand or define their role to be special. All special roles aren't regarded as input for the analysis. Predefined special roles might have a special meaning (like label) but you can define your own roles in the "Set Role" operator by simply entering it.

    I would suggest taking a deep look into the english manual, where all these basic ideas of RapidMiner are explained.

Sign In or Register to comment.