Options

Can linear regression be used for classification?

CuriousCurious Member Posts: 12 Newbie
edited January 2020 in Help
I have noticed that the operator linear regression is indicated as applicable for both regression (estimation) and classification. Is that correct? Is there any extension to it that enables it to be used for classification?

Answers

  • Options
    hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    There are no issues setting up linear regression for classification. Just set your label to a non numeric value.
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Although it is possible, linear regression isn't really designed for non-numerical classification, so just be careful when you do this.  There is nothing that would prevent you from recoding a response variable as a 0/1 output and then using classic linear regression to predict it.  However, the interpretation of linear regression, in terms of the coefficients, the resulting score, and the error you measure, are not the same as they are in classic linear regression.  Here's an article that summarizes some of the main differences between it and logistic regression (which is designed for this type of classification using a regression approach):   https://stackoverflow.com/questions/12146914/what-is-the-difference-between-linear-regression-and-logistic-regression
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn
    Hi @Curious

    To add up to previous answers: I would say, it's possible to some extent, but these two algorithms are not designed to be totally interchangeable for all use cases. A link from @Telcontar120 explains that concept pretty clearly. 

    In the scope of some credit scoring modelling projects, we've been comparing linear and logistic regression results, and somehow it highly depends on certain data which one performs better, however in general performance is quite comparable.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    I also would like to weigh in and point out that for many use cases the differences between both approaches have been relatively small, likely not even statistically significant.  Strangely enough, I found parameter tuning of linear regression simpler / more straightforward than for regression tasks.  This makes Linear Regression, or more appropriately GLM, one of my standard tools I try first for both regression and classification task if the number of columns is not too high.  It is always good to have some benchmark with a linear model to work against.
    Just my 2c,
    Ingo
Sign In or Register to comment.