I read an interview once with one of the founders of SO. They said the main value stackoverflow provided wasn't to the person who asked the question. It was for the person who googled it later and found the answer. This is why all the moderation pushes toward deleting duplicates of questions, and having a single accepted answer. They were primarily trying to make google searches more effective for the broader internet. Not provide a service for the question-asker or answerer.
Sad now though, since LLMs have eaten this pie.
My personal single biggest source of frustration with SO has been outdated answers that are locking out more modern and correct answers. There are so many things for which there is no permanently right answer over time. It feels like SO started solidifying and failed to do the moderation cleaning and maintenance needed to keep it current and thriving. The over-moderation you described helps people for a short time but then doesn’t help the person who googles much later. I’ve also constantly wished that bad answers would get hidden or cleaned out, and that accepted answers that weren’t very good would get more actively changed to better ones that showed up, it’s pretty common to see newer+better answers than the accepted one.
Okay, but who's going to arbitrate that? It's not like anyone was going to delete answers with hundreds of upvotes because someone thought it was wrong or outdated. And there are literally about a million questions per moderator, and moderators are not expected to be subject matter experts on anything in particular. Re-asking the question doesn't actually help, either, except sometimes when the question is bad. (It takes serious community effort to make projects like https://stackoverflow.com/questions/45621722 work.)
The Trending sort was added to try to ameliorate this, though.
Anyway, that is a good question you asked, one that they didn’t figure out. But if there are enough people to ask questions and search for answers, then aren’t there enough people to manage the answers? SO already had serious community effort, it just wasn’t properly focused by the UX options they offer. Obviously you need to crowd-source the decisions that can’t scale to mods, while figuring out the incentive system to reduce gaming. I’m not claiming this is easy, in fact I’m absolutely certain this is not easy to do, but SO brought too little too late to a serious problem that fundamentally limited and reduced the utility of the site over time.
Moderation should have been aimed squarely at making the site friendly, and community should be moderating the content entirely, for exactly the reasons you point out - mods aren’t the experts on the content.
One thing the site could have done is tie questions and answers to specific versions of languages, libraries, tools, or applications. Questions asked where the author wasn’t aware of a version dependency could be later assigned one when a new version changes the correctness of an answer that was right for previous versions. This would make room for new answers to the same question, make room for the same question to be asked again against a new version, and it would be amazing if while searching I could filter out answers that are specific to Python 2, and only see answers that are correct for Python 3, for example.
Some of the answers should be deleted (or just hidden but stay there to be used as defense when someone tries to re-add bad or outdated answers.) The policy of trying to keep all answers no matter how good allowed too much unhelpful noise to accumulate.
Simply getting rid of the stupid dupe policy would've helped solve this a lot better than time-weighted voting.
Yeah it's doubly stupid because the likelihood of becoming outdated is one of the reasons they don't allow "recommendation" questions. So they know that it's an issue but just ignore it for programming questions.
Having duplicates of the question is precisely why people use LLMs instead of StackOverflow. The majority of all users lack the vocabulary to properly articulate their problems using the jargon of mathematicians and programmers. Prior to LLMs, my use case for StackOverflow was something like this:
30 minutes trying (and failing) to use the right search terms to articulate the problem (remember, there was no contextual understanding, so if you used a word with two meanings and one of those meanings was more popular, you’d have to omit it using the exclusion operator).
30 minutes reading through the threads I found (half of which will have been closed or answered by users who ignored some condition presented by the OP).
5 minutes on implementation.
2 minutes pounding my head on my desk because it shouldn’t have been that hard.
With an LLM, if the problem has been documented at any point in the last 20 years, I can probably solve it using my initial prompt even as a layman. When you’d actually find an answer on StackOverflow, it was often only because you finally found a different way of phrasing your search so that a relevant result came up. Half the time the OP would describe the exact problem you were having only for the thread to be closed by moderators as a duplicate of another question that lacked one of your conditions.
Yes; so the idea is they fail to find the existing question, and ask it again, and get marked as a duplicate; and then everyone else with the same problem can search, possibly find the new duplicate version, and get automatically redirected to the main version with high quality answers.
Users would fail to find the existing question not because there was an abundance of poorly-worded questions, but because there was a dearth of questions asked using lay terminology that the user was likely to use.
Users were not searching for error codes but making naive preliminary searches like “XYZ doesn’t work” and then branching off from there. Having answers worded in a variety of ways allowed for greater odds that the user would find a question written the way he had worded his search.
Redirecting users to an older answer also just added pointless friction compared to allowing for the answer from the original question to be reposted on the duplicate question, in the exceedingly rare instances
I understand the motive behind wanting to exclude questions that are effectively just: “Do my work for me.” The issue is you have users actively telling you that the culling process didn’t really work the way it was supposed to, and you keep telling them that they are wrong, and that the site actually works well for its intended purpose—even though its intended purpose was to help users find what they were looking for, and they are telling you that they can’t.
Part of StackOverflow’s decline was inevitable and wouldn’t have been helped by any changes the site administrators could have made; a machine can simply answer questions a lot faster than a collection of human volunteers. But there is a reason people were so eager to leave. So now instead of conforming to what users repeatedly told the administrators that they wanted, StackOverflow can conform to being the repository of questions that the administrators wanted, just without any users or revenue besides selling the contributions made by others to the LLMs that users have demonstrated they actually want to use.
I once distilled a real-life problem into mathematical language exactly like how the Introduction to Algorithms book would pose them only to have the quesiton immediately closed with the explanation "don't post your CS homework".
(My employer at the time was very sensitive about their IP and being able to access the Internet from the work computer was already a miracle. I once sat through a whole day of InfoSec and diciplinary meetings for posting completely dummy bug repoduction code on Github.
I'd say 9/10 times I find a direct match for my question on SO it's been closed as offtopic with links to one or more questions that are only superficially similar.
There are other problems that they don't even try to address. If 10 people ask the same question, why does only the first person to ask it get to choose the answer? Then lots of "XY" questions where the original asker didn't actually have problem X so selects an answer for Y, leaving the original X unsolved, and now all the duplicates only have an answer for Y too.
This problem isn't directly solvable (what counts as a "duplicate" is inherently subjective, and therefore mistakes/differences of opinion are inevitable).
I think a deeper problem is that once a question becomes closed (for any reason), it's unlikely that it'll ever be reopened. The factors behind this are social (askers interpret close votes as signals that they should give up), cultural (there's not much training/feedback/guidelines about what "duplicate" means for those with voting privileges), and technical (there's no first-class feature for askers to contest closure, and it takes just as many votes to reopen a question as it does to close it (with the same voter reputation requirement)).
It's not quite that bad: when the OP edits the question, there is a checkbox to assert that the edit resolves the reason for closure. Checking it off puts the question in a queue for reconsideration.
However, there's the social problem (with possibly a technical solution) that the queue is not as discoverable as it ought to be, and provides no real incentive; the queues generally are useful for curators who work well in a mode of "let's clean up problems of type X with site content today", but not for those (like myself) who work well in a mode of e.g. "let's polish the canonical for problem Y and try to search for and link unrecognized duplicates".
Given the imbalance in attention, I agree that reopening a question should have lesser requirements than closing it. But better yet would be if the questions that don't merit reopening, weren't opened in the first place. Then the emphasis could be on getting them into shape for the initial opening. I think that's a useful frame shift: it's not that the question was rejected; rather, publishing a question basically always requires a collaborative effort.
The Staging Ground was a huge step forward in this direction, but it didn't get nearly the attention or appreciation (or fine-tuning) it deserved.
The idea was, if there's an answer on the other question that solves your question, your question remains in existence as a signpost pointing to the other one without having to pollute and confuse by having a mixture of similar answers across both with different amounts of votes.
Moreover, the LLM has access to all instances of similar problems, while a human can only read one SO page at a time.
The question of what will replace SO in future models, though, is a valid one. People don't realize what a massive advantage Google has over everyone else in that regard. So many site owners go out of their way to try to block OpenAI's crawlers, while simultaneously trying to attract Google's.
I was part of various forums 15 years ago where I could talk shop about many technical things, and they're all gone without any real substitute.
> People don't realize what a massive advantage Google has over everyone else in that regard. Site owners go out of their way to try to block OpenAI's crawlers, while simultaneously trying to attract Google's.
Not really. Website operators can only block live searches from LLM providers like requests made when someone asks a question on chatgpt.com, only because of the quirk that OpenAI makes the request from their server as a quick hack.
We're quickly moving past that as LLMs just make the request from your device with your browser if it has to (to click "I am not a robot").
As for scraping the internet for training data, those requests are basically impossible to block and don't have anything in common with live answer requests made to answer a prompt.
Whatever. I haven't seen a graph like that since Uber kicked the taxi industry in the yarbles. The taxi cartels had it coming, and so does SO. That sort of decline simply doesn't happen to companies that are doing a good job serving their customers.
(As for forums, are you sure they're gone? All of the ones I've participated in for many years are still online and still pretty healthy, all things considered.)
If they were to recreate the site and frame it as a symptom and issue site, which is what the interview described, that would yield many different choices on how to navigate the site, and it would do a lot better. In particular, what happens when two different issues have the same symptom. Right now, that question is closed as a duplicate. Under a symptom and issue site, it's obvious that both should stay as distinct issues.
This is mostly how I engaged with SO for a long, long time. I think it’s a testament to SO’s curation of answers that I didn’t ask almost any questions for like 5+ years after starting programming
In reality the opposite is encouraged. For countless times, I've landed on questions with promising titles/search extracts, only to find irrelevant answers because people grabbed onto some detail in the question irrelevant to my case and provided X-Y answers.
This often also causes subsequent useful questions to be marked as dups even though they no longer contain that irrelevant detail. The appeal process is so unfriendly that most would not bother.
By regenerating an answer on command and never caring about the redundancy, yeah.
The DRY advocate within me weeps.