# "Performance estimation"

Legacy User
Member Posts:

**0**Newbie
Hi,

I'm using supervised machine learning to classify my data. The

approach I use as classifier is a decision tree (but could by any

other)- After constructing an appropriate decision tree, I would like

to measure the model's performance. What are standard measures in the

domain of statistics and artificial intelligence domain to estimate

performance of a classification algorithm?

So far, I've used a leave-one-out cross validation (due to the small

number of examples in the learning set which is about 400) to evaluate

the accuracy (classification error), i.e. how many examples in the test set

were incorrectly predicted. However, I don't think that this is sufficient

for a reliable performance evaluation. What else should I measure?

I'm not sure if a significance test would provide helpful information.

In my text book, they use the significance test to compare two

different classification algorithm w.r.t. to their absolute error

(they determine by a cross validation). Also in the one RapidMiner sample where

the T-Test operator is used, two models are compared. Can a significance test be

also exploited to make performance assumption about a single classifier?

If so, what hypothesis should be tested? And how can this be achieved

in RapidMiner which for T-Test expects two PerformanceVectors?

Thank you.

Regards,

tim

I'm using supervised machine learning to classify my data. The

approach I use as classifier is a decision tree (but could by any

other)- After constructing an appropriate decision tree, I would like

to measure the model's performance. What are standard measures in the

domain of statistics and artificial intelligence domain to estimate

performance of a classification algorithm?

So far, I've used a leave-one-out cross validation (due to the small

number of examples in the learning set which is about 400) to evaluate

the accuracy (classification error), i.e. how many examples in the test set

were incorrectly predicted. However, I don't think that this is sufficient

for a reliable performance evaluation. What else should I measure?

I'm not sure if a significance test would provide helpful information.

In my text book, they use the significance test to compare two

different classification algorithm w.r.t. to their absolute error

(they determine by a cross validation). Also in the one RapidMiner sample where

the T-Test operator is used, two models are compared. Can a significance test be

also exploited to make performance assumption about a single classifier?

If so, what hypothesis should be tested? And how can this be achieved

in RapidMiner which for T-Test expects two PerformanceVectors?

Thank you.

Regards,

tim

Tagged:

0

## Answers

2,531Unicorna leave one out crossvalidation is already a very good estimation of the resulting performance and the best you could do. Statistics provide some different methods for performance estimation, but they are very heuristic and are avoided in the field of data mining since they don't make use of the data we have. The quality of Cross-Validation is determined by the quality of your training sample, the more representative your training data for your problem is, the better the quality and hence the less the performance will be overestimated.

A sigificance test tests, if one model is significantly better than another. You might compare one model with itself but it will never be siginificantly better than itself No way doing that.

Greetings,

Sebastian

347MavenJust a few remarks:

@Crossvalidation: addtional remarks you can find here http://rapid-i.com/rapidforum/index.php/topic,62.0.html Note that if your data suffers from heavy class imbalance, the accuracy could be maximized by simply predicting the bigger class. Hence Precision and Recall should be measured,too. One idea is to calculate the expected value of your measure when using a random classifier i.e. a classifier assigning random classes to all instances. Then you can perform a simple one-sided test given an appropriate distribution assumption. Thus you will see whether your classifier is significantly better than random.

This can not be performed in RapidMiner, but the required formulas are in every book about statistics.

Please not that all significance testing is worthless if the distribution assumptions for the tests are not met. Paired t-test for instance assumes that the difference of the performance measurements is approximately normally distributed.

regards,

Steffen

0Newbieused in practice? Are the accuracy, precision and recall measurements not sufficient?

Any why is it not possible to model your suggested approach in RapidMiner?

Regards,

Tim

347Maven-> Precision for a given class be modeled as binomialdistribution, which can be approximated by a normaldistribution

-> now a simple one-sided test about the parameter p of the binomialdistribution can be performed

-> To estimate precision or p you can simply combine all validationsets of a XValidation by using XVPrediction

As mentioned before, any book about statistics should contain the required information.

btw: I do not know in what kind of position you are, but if there is some kind of statistician around, go and grab him. I made the experience, that the opinions about statistical tests are rather.... different.

regards,

Steffen