Finding an incorrect grading pattern

marketa_vackova · October 2016

I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?

I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.

Thank you very much for advice. Markéta

IngoRM · October 2016

Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again. Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.

Just my 2c,

Ingo

Telcontar120 · October 2016

Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Finding an incorrect grading pattern

Answers