New Logistic Regression Operator: Strange Behavior

earmijoearmijo Member Posts: 270 Unicorn
edited November 2018 in Help

Typically, in the absence of knowledge about the relative cost of missclassification errors a classifier shoud classify an observation as a member of the "True Class" if Probability(True) > 0.5. That's the behavior of most classifiers in Rapidminer (including W-Logistic). 

 

The new classifier "Logistic Regression" seems to be the exception. This classifier classifies an observation as True if Prob(True) > 0.3 (or in the Rapidminer terminology : if Confidence(True) > 0.3). I'm attaching a process showings this behavior. Just run it. Plot a histogram of Confidence(True) and color it using the variable Prediction(label).

 

The pic of the histogram is attached to this message too.

 

Tagged:

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    I tested that too, the other RapidMiner/Weka operators do operate as they should. Based on the H2O documentation, I think it's the F1 optimzation but will confirm.

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    In the sample process you attached, you use a deep learning operator inside the CV. Is this correct?

  • earmijoearmijo Member Posts: 270 Unicorn

    No. I used the new LogisticRegression operator. I didn't even use cross-validation. 

     

    The problem seems to be the GeneralizedLinearRegression routine. I exchanged operator (GLM for Logistic Regression) with the right settings (family=binomial, etc) and I get the same behavior.

  • earmijoearmijo Member Posts: 270 Unicorn
    .
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I see what you're saying. Hmm, let me investigate. 

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    That's very curious.  Did you try comparing the results of the Weka version of the logistic regression operator?

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @yyhuang pointed out to me that it might be related to H2O's f1 optimization of binomal data sets for the GLM algo. http://ethen8181.github.io/machine-learning/h2o/h2o_glm/h2o_glm.html

     

    Will continue to investigate. 

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @Telcontar120 I tested this out using the Weka LR and the old Rapidminer SVM LR algo, both give me a label flip at confidence > 0.5 when using a Generate Data operator set to Random Classification.

     

    I think I'm learning toward the internal F1 measure optimization that H20 is doing behind the scenes for binomal labels, but we're looking into this. 

  • earmijoearmijo Member Posts: 270 Unicorn

    Thanks Thomas. I should add that if you use the Create Threshold and set it to 0.5 it works fine. 

     

    The operator W-Logistic works fine as do the other classifiers in Rapidminer. 

  • earmijoearmijo Member Posts: 270 Unicorn

    Thomas:

    A quick entry to confirm that you were right. H2o chooses the predicted class based on the maximum-F1 threshold. From the User Guide (Generalized LInear Modeling with H2O and R) page 26.

    Screen Shot 2017-06-04 at 6.26.29 PM.png

     

Sign In or Register to comment.