Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Finding an incorrect grading pattern
marketa_vackova
Member Posts: 2 Learner III
I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?
I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.
Thank you very much for advice. Markéta
Tagged:
0
Answers
Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again. Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.
Just my 2c,
Ingo
Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts