100% accuracy is something one should be very careful about. You need to check a few things listed below.
1. Label leakage: If your data set is time-dependent (temporal characters), using regular validation methods like cross-validation is not recommended. 2. Highly correlated column: If your dataset has an attribute that replicates output, then there is a chance of getting 100% accuracy. 3. Validation Type: If you are using something like a split validation, there is a chance that you might get very high random performance. 4. Size of a dataset: If your dataset is too small and you have a very small test set, then you can get very high performance by chance.
These are some points in my mind right now, will update if I get more.
hi everyone,as you mention above. My data set consist of 1150 entities and i have one attribute that is highly correlated with my class attritube .. what should i do know .? i have apply three algorithm on my data set id3,cart and c4.5 so how i calculate with one is perform better than other ?
Answers
100% accuracy is something one should be very careful about. You need to check a few things listed below.
1. Label leakage: If your data set is time-dependent (temporal characters), using regular validation methods like cross-validation is not recommended.
2. Highly correlated column: If your dataset has an attribute that replicates output, then there is a chance of getting 100% accuracy.
3. Validation Type: If you are using something like a split validation, there is a chance that you might get very high random performance.
4. Size of a dataset: If your dataset is too small and you have a very small test set, then you can get very high performance by chance.
These are some points in my mind right now, will update if I get more.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
what should i do know .?
i have apply three algorithm on my data set id3,cart and c4.5
so how i calculate with one is perform better than other ?