Options

# random classifier's accuracy

Member Posts: 21 Contributor II
let's say we have a dataset with >2 label values - let it be 3 for the sake of simplicity. label values are unevenly distributed. my question is: what's the best accuracy  a random classifier can have on such dataset?

• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Hi,

let's first define what is meant by "random classifier":

Option A: The classifier randomly selects a prediction from the possible label values for each prediction. This prediction might follow a specific distribution or not, for example the prediction could be chosen according to the label distribution of the training data.

Option B: The classifier simply alway predicts the major class. This is called "Default Learner" in RapidMiner but I also have heard that people call this random classifier in the past.

For the best accuracy which can be reached I would say:

Option A: 100%. By chance, the classifier can predict all cases correctly. Of course this is less likely as the number of examples grows.

Option B: number of examples in major class / total number of examples.

Although the best reachable accuracy will stay 100% for option A, it is more likely that you would end up with the major class fraction for larger numbers of test examples.

Cheers,
Ingo
• Options
Member Posts: 1 Contributor I
Thanks for posting this intuitive question and giving me a chance to clarify my understanding about random classifiers. May I know if the random classifier also tells us anything about the worst performance one can achieve in an 'n' class problem. Suppose n=2 for the sake of simplicity,and the data is equibalanced, then does a random classifier's performance tells us that the performance of any other classifier on this data cant be less than 50%. If not how is it used to assess the quality of any classifier in case of balanced and unbalanced data both? I hope the question is clear enough to respond,if not kindly let me know. Thanks!