Options

# "Which Learning Algorithm to use for probability estimation?"

Ghostrider
Member Posts:

**60**Contributor II
I have several (around 30) attributes that I want to feed into a learning algorithm. The attributes are all numeric. The result that I am after is a probability about whether one event will or will not happen (I'm only trying to predict the probability of one event, not multiple events / classification). The probability of event has a non-linear dependence on the attributes. What I mean by this, sometimes a 70% chance of event occurring can be given based upon the conditions of several attributes when taken as a whole. Sometimes, a 70% chance of event occurring can be inferred based on condition of one attribute in particular. The example space is huge so a fast algorithm would be preferred. Can anyone make some recommendations on which learning algorithm to use? If it's not part of RM, but has an open-source Java library, I'd still consider it.

EDIT/UPDATE: One example of what I am looking for is more commonly known as a probabilistic neural network. Link: http://www.statsoft.com/textbook/neural-networks/. ; The disadvantage of such a network, however, is that the model stores the training data. Anyone know of a learning algorithm which outputs probability for each class (in my case, only one...maybe 3 eventually) that does not require storing all training examples?

EDIT/UPDATE: One example of what I am looking for is more commonly known as a probabilistic neural network. Link: http://www.statsoft.com/textbook/neural-networks/. ; The disadvantage of such a network, however, is that the model stores the training data. Anyone know of a learning algorithm which outputs probability for each class (in my case, only one...maybe 3 eventually) that does not require storing all training examples?

Tagged:

0

## Answers

2,531Unicornyou can use Naive Bayes if you want to have a straight forward probability calculation.

But I wonder why you have the constraint that the result must be the result of a probability calculation?

Greetings,

Sebastian

347MavenI recommend Logistic Regression since you only have numeric predictors and a binary response variable. It is indeed slower than NaiveBayes, but the output is a generally better approximation to the probability you seek to calculate. NaiveBayes probabilities are not that well calibrated and tend to clump in regions near 0 and 1.

Regarding general model quality (AUC etc.), logistic regression and naive bayes perform both well.

greetings,

steffen