F-Score for Multiclass Problems?

Fred12Fred12 Member Posts: 344   Unicorn
edited November 2018 in Help


in binomial classification, f-measure exists, but not in Performance (Classification) operator for multiclass data, 

does there exist a f-measure for multiclass problems? is it in some way possible to built or generate an attribute f-score (multiclass) with the existing fields like precision / recall etc.?


what about 2* (Precision(class1+class2+...+class_n)*Recall(class1+..+class_n)) / (Sum of precision + recall over all classes) ?

does that make sense in some way? what do you guys think? 

Is this still comparative? I know with more classes it is likely that precision and recall will go down, do the multiplication factors tend to weight too much and take bad performance too much into account, regarding the division by the sum of terms in the denominator? or is it still accurately enough?


I just want to get a better performance measure overall for the whole result if one class is just massively underrepresented...



  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,045  RM Data Scientist



    isn't logloss a good option for you?


    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • Fred12Fred12 Member Posts: 344   Unicorn

    hm I don't know how to interpret logistic loss..

    it is ln(1+exp(-[conf(CC)])), so it calculates that formula with the value of confidence for the class predicted which (should) be the correct class, and adds it up for all training examples and then averages it, is that correct?

    so the smaller the confidence, the bigger the logistic loss.. but in what range are the values and how should I interpret them? what is a small/ big loss?

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,150   Unicorn

    Here's a good simple explanation of logloss from Kaggle, where it is a popular performance metric:



    The value range is dependent on the dataset you have and the predictive power of your attributes, there is no absolute answer that says what a "good" logloss is vs a "bad" one.


    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • varunm1varunm1 Member Posts: 497   Unicorn
    Hi @mschmitz

    For a multi class if we need F-1 score. Do I need to calculate manually from recall and precision? I see there is no option in rapidminer this. I want to calculate due to workshop requirement.

  • yyhuangyyhuang Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 193  RM Data Scientist
    Hi @varunm1,

     You can get weighted average of the F1 scores of each class for the multiclass task. So the multi-class problem has to be converted to multiple binomial tasks, one task for each class. You will need the performance (binomial classification) and performance to data then aggregate to get the average F1 scores.

    each F1 = 2 * (precision * recall) / (precision + recall)


  • varunm1varunm1 Member Posts: 497   Unicorn
    Hi @yyhuang,

    Yep, RM provides weighted mean Precision and recall from performance multi class operator. I will use it to calculate the over all f1 score.

  • yyhuangyyhuang Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 193  RM Data Scientist
    edited January 7
    Hi @varunm1,
    Surely the weighted mean precision and recall from multi-class performance is useful. If you use the weighted recall and weighted precision to generate F1 = 2 * (precision * recall) / (precision + recall)
    it may not be the expected F1. Since the formula is non-linear. The one you calculated would be slightly higher.

    Here is a quick test on iris data
    F1 score with the weighted recall and weighted precision from multi-class performance would be 2*91.67*92.24/(91.67+92.24)=91.95%, but the other methods by averaging on the f-scores of each class will give you 91.62%

                               precision        recall         f1-score 
    iris-setosa                100%         100%      100%
    iris-versicolor            82.61%      95%        88.37%
    iris-virginica              80%           94.12%   86.49%
    avg/total                   92.24%       91.67%   91.62%
Sign In or Register to comment.