🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

A query about Log. Regression !!

omarnjomarnj Member Posts: 8 Contributor II
Hello everyone, 

I want to ask about the Coefficients and intercept of the log Regression .. sometimes I get big numbers for the Coefficients or negative big number fro the intercept .. how can I make sure that everything is correct and no mistakes in my model !! 

Thanks.

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,512  Community Manager
    hi @omarnj can you please post your process XML and your data set so we can see what you are asking about?
    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

  • jacobcybulskijacobcybulski Member, University Professor Posts: 83   Unicorn
    It really does not matter if coefficients are very large or very small. If you worry about their magnitude try normalizing your attributes (still no guarantee they would become smaller). Just treat your Logistic Regression as any other binomial classification model and validate it (or better cross-validate it). Use Performance (Binomial Classification) and measure its Accuracy, Kappa and Correlation. Then you'll know if it worked well or not.
    Jacob
    IngoRM
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,248   Unicorn
    You can essentially ignore the intercept term.  It is merely there as a bias offset, and it has no influence on the factors in the model.  In fact you can specify in an advanced option whether to include or remove the intercept (standardized) altogether.
    As previously implied, you should focus on the standardized coefficients (part of the model description output, 2nd column) to understand the relative magnitude of the effects of different attributes.  The original coefficients are scaled based on the ranges of each underlying attribute, which may not be the same.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 341   Unicorn

    From the algorithmic point of view, you can compare the RM implementation with the ones in Python and R, you should obtain the same results (make sure that you are using the same options, for example for the solver).

    If you want to know whether a logreg model is adequate, you have several options. If you are interested in the model's performance, your best bet is to do cross validation and look at accuracy, recall, F-value, AUC, etc. If you are interested in the interpretation of the coefficients, you can calculate confidence intervals by hand by using the standard errors. Here is an explanation:



    Note that statistical libraries in R or Python give you this information directly.


    Kind regards,
    Sebastian




Sign In or Register to comment.