I'd bet a lot of YouTubers are using LLMs to write and/or edit content. So we pass that through a human presentation. Then introduce some errors in the form of transcription. Turn feed the output in as part of a training corpus ... we plateaued real quick.
It seems like it's hard to get past a level of human intelligence at which there's a large enough corpus of training data or trainers?
Anyone know of any papers on breaking this limit to push machine learning models to super-human intelligence levels?