Is it possible to get 100% for split validation accuracy ?

Joannach0ng · July 2019

Is it possible to get 100% for split validation accuracy and what are the pros of getting 100% accuracy ?Thank you

jmergler · July 2019

Hi @Joannach0ng,
In my opinion, most of the time this would be alarming. For some problems it may be possible, and for most real business problems not. A point of reference that might be helpful is to ask, 'If a team of experts were to look closely at the data, how good would they be at making their predictions?' That can sometimes give you an idea for what a good accuracy might be. For some simple problems it may be near or at 100%, for many problems in business it won't be anywhere close.

If you have 100% accuracy, I would check for attributes that are too closely correlated with the outcome; they may contain information that wouldn't be available until after the outcome is observed. There's some more information about correct validation in this course: https://academy.rapidminer.com/learn/course/applications-use-cases-professional/

I'd recommend taking a little time to go through the course. Also, if you have come up with 100% accuracy, are you able to share more about the use-case and data, or the process you are using? We might be able to provide better help.

rfuentealba · July 2019

If you come across this problem, check if you included any ID’s in your data source. This happens especially when you are using Decision Trees (or another tree-based algorithm): the tree tries to overfit and the best way to identify a row becomes the ID, so your algorithm isn’t useful, because every single row will have an unseen ID in production.

my 2 cents.

Joannach0ng · August 2019

@jmergler Hi thank you for you reply !Actually I was told by my tutor to have a 100% accuracy prediction ,so I was wondering if it is possible as I have tried from 0-1 but could get to 100% ,can adding some operator do so ?Thanks!

Joannach0ng · August 2019

@rfuentealba Hi thank you for you reply !Actually I was told by my tutor to have a 100% accuracy prediction ,so I was wondering if it is possible as I have tried from 0-1 but could I get to 100% accuracy by adding some operator do so ?Thanks!

kypexin · August 2019

Hi @Joannach0ng

I am taking a risk of being accused by others for teaching you bad things

but technically you can achieve it this way, if you train and test model on exactly same dataset:

Image: https://us.v-cdn.net/6030995/uploads/editor/ul/njufgndoabua.png

But still, take other commenters concerns into account, because this thing:

Makes no sense for and real life / machine learning problems.
Is a serious mistake from data science point of view.

Are you sure this is exactly the thing you are asked bu the tutor?? If yes, I suggest to study the problem in question and convince your tutor this is a totally wrong thing.

varunm1 · August 2019

@kypexin Your solution perfectly fits tutor requirements

Telcontar120 · August 2019

I want to echo the many cautions here--in real life, 100% accuracy on any test dataset is almost always an indicator that there is some performance leakage occurring---an id, or a surrogate for the label that would not really be available at the time of the prediction. It should be viewed very skeptically, not as a realistic goal.

One possible exception might be if you have a small number of examples in the test dataset but a large number of attributes in the model, in which case your model can be "over-specified" (basically too many attributes will lead to some unique combination serving as a kind of id to make the predictions). Or if you just have too few examples in the test set altogether (e.g., imagine the reductio of 1 test case, which would then either be 100% accurate or 0%!) this can also happen by random chance.

rfuentealba · August 2019

Now that you mention, I had a requirement once, years ago. I didn't even exist here. If you are familiar with logic gates, you know how they work. Else, there is an explanation here.

The thing is that I had a dataset with some 12 attributes working like this (for the sake of reducing complexity, I'm going to explain with an OR logic gate):

a1 a2 ax<br> 0  0  0<br> 0  1  1<br> 1  0  1<br> 1  1  1<br>

The idea was to actually build a program that could act like that because the program was compiled in C, there was no source and the logic controller it was compiled on needed a replacement. I ended up training a decision tree because I had no clue on what the order of the logic gates could be, and the logic controller ended up being an old computer.

Not the most elegant solution but hell of a win for data science.

All the best,

Rodrigo.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Is it possible to get 100% for split validation accuracy ?

Answers

Be Safe. Follow precautions and Maintain Social Distancing