The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Finding an incorrect grading pattern
marketa_vackova
Member Posts: 2 Learner III
I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?
I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.
Thank you very much for advice. Markéta
Tagged:
0
Answers
Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again. Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.
Just my 2c,
Ingo
Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts