"Which Learning Algorithm to use for probability estimation?"

Ghostrider · January 2011

I have several (around 30) attributes that I want to feed into a learning algorithm. The attributes are all numeric. The result that I am after is a probability about whether one event will or will not happen (I'm only trying to predict the probability of one event, not multiple events / classification). The probability of event has a non-linear dependence on the attributes. What I mean by this, sometimes a 70% chance of event occurring can be given based upon the conditions of several attributes when taken as a whole. Sometimes, a 70% chance of event occurring can be inferred based on condition of one attribute in particular. The example space is huge so a fast algorithm would be preferred. Can anyone make some recommendations on which learning algorithm to use? If it's not part of RM, but has an open-source Java library, I'd still consider it.

EDIT/UPDATE: One example of what I am looking for is more commonly known as a probabilistic neural network. Link: http://www.statsoft.com/textbook/neural-networks/. ; The disadvantage of such a network, however, is that the model stores the training data. Anyone know of a learning algorithm which outputs probability for each class (in my case, only one...maybe 3 eventually) that does not require storing all training examples?

land · January 2011

Hi,
you can use Naive Bayes if you want to have a straight forward probability calculation.

But I wonder why you have the constraint that the result must be the result of a probability calculation?

Greetings,
Sebastian

steffen · January 2011

Hello,

I recommend Logistic Regression since you only have numeric predictors and a binary response variable. It is indeed slower than NaiveBayes, but the output is a generally better approximation to the probability you seek to calculate. NaiveBayes probabilities are not that well calibrated and tend to clump in regions near 0 and 1.
Regarding general model quality (AUC etc.), logistic regression and naive bayes perform both well.

greetings,

steffen

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Which Learning Algorithm to use for probability estimation?"

Answers