Options

confidence values from w-logistic seem out of range

bobdobbsbobdobbs Member Posts: 26 Maven
edited June 2019 in Help
Hi,

Testing out some data with the w-logistic operator.

I trained it on about 20,000 examples with two input variables and a binomial label.  ("sick", "not_sick")

I then ran a test set of 1900 examples through the resulting model.  (about 130 of them are "sick")

The w-logistic model returns confidence estimates for the "sick" class that are at the highest .28

I assumed that was the "probability" of the example being in the sick class.

What is odd is that out of the 20 highest scoring examples (score from .233 to .254) 14 of them are labeled as "sick" .  This is 70% of the examples.  So it appears as if the w-logistic model is picking class members with a 70% probability.  If so, then why am I seeing confidence scores of .233???

Can anyone shed some light on this apparent discrepancy? 
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the weka learners are a black box for us as much as they are for you. So I cannot explain this behavior. If you would replace it by our own logistic regression model, we could take a look at every strange behavior :)

    Greetings,
      Sebastian
  • Options
    bobdobbsbobdobbs Member Posts: 26 Maven
    Sebastian,

    Your suggestion made a HUGE difference.  The RM Logistic Regression model is delivering results that look very consistent.  Much more like we expected.

    Thank You  ;D
Sign In or Register to comment.