Options

Unexpected predictions

dataminer99dataminer99 Member Posts: 3 Contributor I
edited November 2018 in Help
Hello,

Despite seeing good predictions (~70% accuracy) from my training and validation sets, I am having trouble scoring my records for use.  I have 250K records to score and 99% of them have the same prediction result (Y) and identical confidence scores.  Yes confidences in the scored data set are always 0.818.  No confidences are always 0.182. 

My expectation is the predictions and associated confidences will not be identical, as they are when I score my data.  I have replaced my actual / real data with the "Generate Direct Mailing Data" operator in every process.  Unfortunately the generated data produces consistent training, validation, and scoring data throughout.  I.e. no problems.  My real training data set has 44,000 records; 2 special attributes (1 ID, 1 nominal label) and 66 regular attributes (26 integer, 12 nominal, 28 real).  I would have added code from my 4 processes but it caused this message to exceed the 20K character limit.  Any suggestions are much appreciated!  Mike

Answers

  • Options
    B_MinerB_Miner Member Posts: 72 Contributor II
    What kind of model are you fitting? When you say you get good accuracy, is there a range of probabilities produced or just this same identical score?
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if you use accuracy as a performance measure, you will have to compare it with the default accuracy, that means: What accuracy would you have, if you always say it's the most frequent label. If your "yes" examples cover 70% of your examples, an accuracy of 70% does not sound too good :)

    Greetings,
      Sebastian
Sign In or Register to comment.