The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
GLM: weights vs. coefficients
Hi miners,
I am training GLM model for binary classification, so basically I perform logistic regression.
My question is, how do I interpret the relation between GLM model weights output and regression coefficients?
In many cases, they are exactly the same, but some differ, and some on a very high magnitude. For example, for one feature weight and regression coefficient both equal 1.841; then for another feature I observe weight 0.328 while regression coefficient is 0.0002; yet for another feature weight is -0.617 and coefficient is -0.001.
(I use regularisation so the whole coefficients / weights range is not that big, let's say roughly between 2 and -2).
I am training GLM model for binary classification, so basically I perform logistic regression.
My question is, how do I interpret the relation between GLM model weights output and regression coefficients?
In many cases, they are exactly the same, but some differ, and some on a very high magnitude. For example, for one feature weight and regression coefficient both equal 1.841; then for another feature I observe weight 0.328 while regression coefficient is 0.0002; yet for another feature weight is -0.617 and coefficient is -0.001.
(I use regularisation so the whole coefficients / weights range is not that big, let's say roughly between 2 and -2).
Tagged:
1
Best Answers
-
IngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi,Where are the weights coming from? I assume from the weights port of the GLM operator? And are you looking at the "standardized" coefficients? The weights are simply the standardized coefficients and should be the same if you use the weights port of the GLM...Hope this helps,Ingo6
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn@kypexin Sorry for the confusion! I definitely misunderstood your initial question. Hopefully I can be more helpful here
If you are using the score on new raw data, then you will want to use the normal coefficients. The standardized coefficients are adjusted so they are comparable but they won't work to generate a score (unless you have normalized all the input data based on standard errors, which is very unlikely).
7
Answers
In theory the GLM with the binomial link function/IRLSM and the logistic regression with IRLSM are the same, but only if all the other parameters are the same. See the attached simple example where you can confirm this:
Additionally, since neither are direct solutions but involve iterative approximation, if you have a lot of predictors with shared covariance, it is also conceivable that you could get different coefficients due to random effects. Setting a random seed for both will ensure you are getting reproducible results (but still might not completely solve this issue). IIRC, the more shared covariance between predictor sets, the more unstable the coefficients will be (keep in mind the multicollinearity issues from linear regression which cause coefficient inflation, the same basic dynamic is at work here).
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks for the example; it makes a clear point, however this is not exactly what I was asking for. I am not trying to compare GLM and LR, but actually I have just one GLM model where I am comparing model coefficients with feature weights. I think Ingo's answer cleared it pretty well.
Vladimir
http://whatthefraud.wtf
Thanks for an advise, I was looking at the first column of coefficients (not standardized). In fact, std. coefficients and weights from GLM weights output port are the same, so I have my question answered.
However, I have now the second question: if I use derived coefficients for a regression equation (which for example I then put into code to make predictions on new data), should I actually use normal or standardized coefficients, or it won't make a difference? What I exactly mean, I am using the following formulas to calculate probability on new data:
Vladimir
http://whatthefraud.wtf
Vladimir
http://whatthefraud.wtf