In one-line of python, cleanlab can automatically:
(1) find mislabeled data + train robust models (2) detect outliers (3) estimate consensus + annotator-quality for datasets labeled by multiple annotators (4) suggest which data is best to label or re-label next (active learning)
It has quick 5min tutorials for many types of data (image, text, tabular, audio, etc) and ML tasks (classification, entity recognition, image/document tagging, etc).
Engineers used cleanlab at Google to clean and train robust models on speech data, at Amazon to estimate how often the Alexa device doesn’t wake, at Wells Fargo to train reliable financial prediction models, and at Microsoft, Tesla, Facebook, etc. Hopefully you'll find cleanlab useful in your ML applications, it's super easy to try out!
https://www.kaggle.com/code/ulytkch/cleanlab-data-centric-ai...