Finding an incorrect grading pattern


Finding an incorrect grading pattern

I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?

I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.

Thank you very much for advice. Markéta

See more topics labeled with:


Re: Finding an incorrect grading pattern

Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again.  Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.


Just my 2c,


How to load processes in XML from the forum into RapidMiner: Read this!
Elite III

Re: Finding an incorrect grading pattern

Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.  



Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts