Options

# "[SOLVED] Performance Operator Evaluation - Mathematics / Derivation"

Member Posts: 6 Contributor II
edited June 2019 in Help
Would someone be able to enlighten me on the mathematics behind some of the performance evaluation metrics and/or point me to a nice resource/website?  Specifically, if I am using a Performance (Classification) Operator, I would like to know how the following are derived:
• Accuracy: specifcially, the +/- %
• The difference between the mikro percentages and the given percentages
• Classification Error vs. Relative Error vs. Root Mean Squared Error
• How/Why the +/- % for Accuracy, Weighted Mean Recall, and Classification Error are different and why
Thank you.

Jason
Tagged:

• Options
RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
Hi Jason,

- the accuracy is defined as the probability that a new example is classified correctly. It is calculated as (#ofCorrectPrediction/#numberOfExamples)
- the classification error is 1-accuracy
- the absolute error is calculated via the following formula: sum(1-confidence(trueClass)) / #numberOfExamples
- the relative error is  absolute_error * 100%
- the root mean squared error is calculated as: sqrt(  sum(  (1-confidence(trueClass))^2  ) / #numberOfExamples )

The +- and the makro/mikro values are only calculated if the performance is estimated by a Cross Validation. In that case, the accuracy is calculated for each fold (iteration) of the validation. The makro performance is the average of the performance value of all folds, the +- states the standard deviation of that value.
For the mikro average remember that each fold of the X-Validation uses 10% of the data set as test set and creates predictions on that set. After all 10 folds, there exist predictions for the complete dataset, and you can calculate the accuracy based on these predictions. The result is the mikro average. Since it is calculated from only on single dataset, there is no standard deviation.

Hope this helps!

Best regards,
Marius