Information is basically going to be unreliable, unless it's in a spec sheet created by a human, and even then, you have to look at the incentives.
"The articles on this database are automatically generated by our AI system" https://www.digicomply.com/dietary-supplements-database/pana...
Is the information on that page correct? I'm not sure but as soon as I noticed it was AI generated I lost all trust. And that's because they bothered to include the warning.
This is stereotypical Gell-Mann amnesia - you have to validate information, for yourself, within your own model of the world. You need the tools to be able to verify information that's important to you, whether it's research or knowing which experts or sources are likely to be trustworthy.
With AI video and audio on the horizon, you're left with having to determine for yourself whether to trust any given piece of media, and the only thing you'll know for sure is your own experience of events in the real world.
That doesn't mean you need to discard all information online as untrustworthy. It just means we're going to need better tools and webs of trust based on repeated good-faith interactions.
It's likely I can trust that information posted by individuals on HN will be of a higher quality than the comments section in YouTube or some random newspaper site. I don't need more than a superficial confirmation that information provided here is true - but if it's important, then I will want corroboration from many sources, with validation by an expert extant human.
There's no downside in trusting the information you're provided by AI just as much as any piece of information provided by a human, if you're reasonable about it. Right now is as bad as they'll ever be, and all sorts of development is going in to making them more reliable, factual, and verifiable, with appropriately sourced validation.
Based on my own knowledge of ginseng and a superficial verification of what that site says, it's more or less as correct as any copy produced by a human copy writer would be. It tracks with wikipedia and numerous other sources.
All that said, however, I think the killer app for AI will be e-butlers that interface with content for us, extracting meaningful information, identifying biases, ulterior motives, political and commercial influences, providing background research, and local indexing so that we can offload much of the uncertainty and work required to sift the content we want from the SEO boilerplate garbage pit that is the internet.
Except anthropologically speaking we still live in trust-based society. We trust water to be available. We trust the grocery stores to be stocked. We trust that our Government institutions are always going to be there.
All this to say we have a moral obligation not to let AI spam off the hook as "trust but verify". It is fucked up that people make money abusing innate trust-based mechanism that society depends on to be society.
I know it's popular to hate Google around here, but yes they are. It's their core competency. You can argue that they're doing a bad job of it, or get bogged down in an argument about SEO, or the morality and economics of AdWords, but outside of our bubble here, there are billions of people who type Facebook into Google to get to the Facebook login in screen, and pick that first result. Or Bank of America, or $city property taxes. (Probably not those, specifically, because the majority of the world's population speaks languages other than English.)
AI just introduces another layer of mistrust to a system with a lot of perverse incentives.
In other words, if the information was also unreliable in the past, it doesn't mean it can't get much worse in the future.
At some point, even experts will be overwhelmed with the amount of data to sift through, because the generated data is going to be optimized for "looking" correct, not "being" correct.
either by a "raid" by some organized group seeking to shape discourse or just accidentally by someone creating the right conditions via entertainment. With enough digging into names/phrases you can backtrack to the source.
LLMs trained on these sources are gonna have the same biases inherently. This is before considering the idea that the people training these things could just obfuscate a particularly biased node and claim innocence.
[1]: https://www.glfharris.com/posts/2024/low-background-lexicogr...
This has always been true but I think you’re right that there has been a clear division pre and post 2022