The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Finding an incorrect grading pattern

marketa_vackovamarketa_vackova Member Posts: 2 Contributor I
edited November 2018 in Help

I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?

I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.

Thank you very much for advice. Markéta


  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again.  Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.


    Just my 2c,


  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.  



    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.