Options

# "Linear regression / GLM with weights"

Hi there,

I am tuning linear regression binary classifier (GLM operator) and I experiment with weighting the minority class with GENERATE WEIGHTS (STRATIFICATION), as the dataset is highly imbalanced:

I see here some difference in performance, with and without weighting the train data, like follows.

Model trained without weighting:

Model trained with weighting:

Put aside the business aspect, which result is more desirable, the question is: how exactly applying weighting does affect the logistic regression curve and coefficients, as I see most weights correlating but some much less.

I am tuning linear regression binary classifier (GLM operator) and I experiment with weighting the minority class with GENERATE WEIGHTS (STRATIFICATION), as the dataset is highly imbalanced:

I see here some difference in performance, with and without weighting the train data, like follows.

Model trained without weighting:

Model trained with weighting:

Put aside the business aspect, which result is more desirable, the question is: how exactly applying weighting does affect the logistic regression curve and coefficients, as I see most weights correlating but some much less.

Tagged:

0

## Comments

1,635UnicornBut I think it's nearly impossible to say why some individual coefficients change by more and others by less in a multivariate environment because of all the relationships between predictors (and you have a lot of them!). If you had fewer predictors that were primarily orthogonal in their relationships, this would be a bit more transparent, but even without weighting, trying to sort out why related variables get the coefficients they do is tricky because of interaction and "suppression" effects.

So at the end I am not exactly sure what the specific question is here---the general answer is "with weighting coefficients are adjusted to improve the predictions (i.e., reduce classification errors) for the positive class overall because they now have more weight than the negative class" but I feel like you already know this and thus the answer to your question can't be that simple. But if you want to know why certain attribute coefficients change and by how much, I don't think anyone is going to be able to tell you that without an exhaustive look at the relationships between all those predictors.

As a side note, you can also replicate this effect and do some more investigation of your own by switching to a performance(cost) matrix, which has a similar effect to weighting, but allows you to specify separate "costs" for the different misclassification errors for both classes. But I think you know that already too

Lindon Ventures

Data Science Consulting from Certified RapidMiner Experts