Can linear regression be used for classification?

CuriousCurious Member Posts: 12 Newbie
edited March 6 in Help
I have noticed that the operator linear regression is indicated as applicable for both regression (estimation) and classification. Is that correct? Is there any extension to it that enables it to be used for classification?
Tagged:

Answers

  • hughesfleming68hughesfleming68 Member Posts: 250   Unicorn
    There are no issues setting up linear regression for classification. Just set your label to a non numeric value.
    varunm1
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,270   Unicorn
    Although it is possible, linear regression isn't really designed for non-numerical classification, so just be careful when you do this.  There is nothing that would prevent you from recoding a response variable as a 0/1 output and then using classic linear regression to predict it.  However, the interpretation of linear regression, in terms of the coefficients, the resulting score, and the error you measure, are not the same as they are in classic linear regression.  Here's an article that summarizes some of the main differences between it and logistic regression (which is designed for this type of classification using a regression approach):   https://stackoverflow.com/questions/12146914/what-is-the-difference-between-linear-regression-and-logistic-regression
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    hughesfleming68yyhuangsgenzervarunm1
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 280   Unicorn
    Hi @Curious

    To add up to previous answers: I would say, it's possible to some extent, but these two algorithms are not designed to be totally interchangeable for all use cases. A link from @Telcontar120 explains that concept pretty clearly. 

    In the scope of some credit scoring modelling projects, we've been comparing linear and logistic regression results, and somehow it highly depends on certain data which one performs better, however in general performance is quite comparable.
    varunm1
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    I also would like to weigh in and point out that for many use cases the differences between both approaches have been relatively small, likely not even statistically significant.  Strangely enough, I found parameter tuning of linear regression simpler / more straightforward than for regression tasks.  This makes Linear Regression, or more appropriately GLM, one of my standard tools I try first for both regression and classification task if the number of columns is not too high.  It is always good to have some benchmark with a linear model to work against.
    Just my 2c,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    sgenzervarunm1hughesfleming68
Sign In or Register to comment.