[edit] It seems to me that this is a lot like the problem which bar trivia nights faced around the inception of the smartphone. Bar trivia nights did, sporadically and unevenly, learn how to evolve questions themselves which couldn't be quickly searched online. But it's still not a well-solved problem.
When people ask "why do I need to remember history lessons - there is an encyclopedia", or "why do I need to learn long division - I have a calculator", I guess my response is: Why do we need you to suck oxygen? Why should I pay for your ignorance? I'm perfectly happy to be lazy in my own right, but at least I serve a purpose. My cat serves a purpose. If you vibe code and you talk to LLMs to answer your questions...I'm sorry, what purpose do you serve?
There are apps that build up a nice sized user base on this small convenience aded of getting 2 answers at once REF https://lmarena.ai/ https://techcrunch.com/2025/05/21/lm-arena-the-organization-...
All the major AI companies of course do not want to give you the answers from other AI's so this service needs to be a third party.
But then beyond that there are hard/niche questions where the AI's are wrong often and humans also have a hard time getting it right, but with a larger discussion and multiple minds chewing the problem one can get to a more correct answer often by process of elimination.
I encountered this recently in a niche non-US insurance project and I basically coded together the above as an internal tool. AI suggestions + human collaboration to find the best answer. Of course in this case everyone is getting paid to spend time with this thing so more like AI first Stack Overflow Internal. I have no evidence that an public version would do well when ppl don't get paid to commend and rate.
But also many said that it would be better if one wraps this in an agency so the leads that are generated from the AI accounting questions only go to a few people instead of making it fully public stackexchange like.
So +1 point -1 point for the idea of a public version.
Mainly, it was good at making you feel useful and at honing your own craft - because providing answers forced you to think about other people's questions and problems as if they were little puzzles you could solve in a few minutes. Kept you sharp. It was like a game to play in your spare time. That was the reason to contribute, not the points.
It's why AI based web search isn't behaving like google based search. People clicking on the best results really was a signal for google on what solution was being sought. Generally, I don't know that LLMs are covering this type of feedback loop.
Human beings want to help out other human beings, spread knowledge and might want to get recognition for it. Manually correcting (3 different) automation efforts seems like incredible monotone, unrewarding labour for a race to the bottom. Nobody should spend their time correcting AI models without compensation.
Speaking of evals the other day I found out that most of the people who contributed to Humanities Last Exam https://agi.safe.ai/ got paid >$2k each. So just adding to your point.