random classifier's accuracy

meliniakmeliniak Member Posts: 21 Contributor II
let's say we have a dataset with >2 label values - let it be 3 for the sake of simplicity. label values are unevenly distributed. my question is: what's the best accuracy  a random classifier can have on such dataset?


  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    let's first define what is meant by "random classifier":

    Option A: The classifier randomly selects a prediction from the possible label values for each prediction. This prediction might follow a specific distribution or not, for example the prediction could be chosen according to the label distribution of the training data.

    Option B: The classifier simply alway predicts the major class. This is called "Default Learner" in RapidMiner but I also have heard that people call this random classifier in the past.

    For the best accuracy which can be reached I would say:

    Option A: 100%. By chance, the classifier can predict all cases correctly. Of course this is less likely as the number of examples grows.

    Option B: number of examples in major class / total number of examples.

    Although the best reachable accuracy will stay 100% for option A, it is more likely that you would end up with the major class fraction for larger numbers of test examples.

  • Options
    tabazimtabazim Member Posts: 1 Contributor I
    Thanks for posting this intuitive question and giving me a chance to clarify my understanding about random classifiers. May I know if the random classifier also tells us anything about the worst performance one can achieve in an 'n' class problem. Suppose n=2 for the sake of simplicity,and the data is equibalanced, then does a random classifier's performance tells us that the performance of any other classifier on this data cant be less than 50%. If not how is it used to assess the quality of any classifier in case of balanced and unbalanced data both? I hope the question is clear enough to respond,if not kindly let me know. Thanks!
Sign In or Register to comment.