Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"leave_one_out_performance_problem"
bojansimoski
Member Posts: 2 Contributor I
Hello guys,
so i'm using X-validation for my analysis and i have one question about interpreting the results i have from the performance operator.. So for the accuracy of the classifier i have something like : accuracy: 65.38% +/- 36.08% ; And my question is about the second argument i have here : 36.08% ... What is this? And how is computed ? I need to mention that i use leave one out technique ..
Many Thanks!!
so i'm using X-validation for my analysis and i have one question about interpreting the results i have from the performance operator.. So for the accuracy of the classifier i have something like : accuracy: 65.38% +/- 36.08% ; And my question is about the second argument i have here : 36.08% ... What is this? And how is computed ? I need to mention that i use leave one out technique ..
Many Thanks!!
Tagged:
0
Answers
The first part of the displayed accuracy is the mean accuracy of all N models, and the second part is the standard deviation.
Best,
Marius
Best, Marius
And interpreting the results in that situation they are strange.
I got results
84.26 +/- 36.08 or 63.38 +/- 47.57
and if in both cases I assume that this standart deviation is computed as sqrt(p(1-p)). Taking as p=accuracy (so p=0.8426. for instance) I got then the value 0f the standard deviation shown . In the example sqrt(0.8426(1-0.8426)). But this I think is not ok, bacause accuracy is not a bernoulli distribution. I think the value should be further divided by sqrt(N).... So my question is as Bojan how is this standard deviation computed?
thank you?
AMT
Best,
Marius
But here I do not think that it is what it was used. With one example you got correct and non-correct.
At the the end of the n iterations, a count variable with a binomial distribution is obtained as at each iteration a bernoulli distribution.
And what I was pointing it is that this standard deviation seems to be estimated using the formulas of the standard deviation for a bernoulli distribution ----- sqrt(p(1-p))) ------ and this I did not found in wikipedia page you point. So how it is really estimated the standard deviation.
Another point it is how you interpret a result like the ones I showed where performance can have such large spread? Even being larger than 100%?
you can transform: p(1-p) = p - p^2, which is equivalent to the standard formula for the standard deviation where the values are only 0 or 1.
Best,
Marius
But this is the point. I think that to compute the std (standard deviation) of the accuracy you need further divide by sqrt(n) ... What do you think?
Greetings
A.M. Tomé
With accuracy values in 0 and 1 the usefulness of this value is certainly questionable. Same applies to the +- notation, since it's not the error of the accuracy.
We will discuss that here at Rapid-I. Thanks for your input!
Best,
~Marius
Any new about this comment?
AMT
Best regards,
Marius