However, humans do have some known failure cases that help us detect that. For instance, pressing the human on a couple of details will generally show up all but the very best bullshit artists; there is a limit to how fast humans can make crap up. Some of us are decent at the con-game aspects but it isn't too hard to poke through this limit on how fast they can make stuff up.
Computers can confabulate at full speed for gigabytes at a time.
Personally, I consider any GPT or GPT-like technology unsuitable for any application in which truth is important. Full stop. The technology fundamentally, in its foundation, does not have any concept of truth, and there is no obvious way to add one, either after the fact or in its foundation. (Not saying there isn't one, period, but it certainly isn't the sort of thing you can just throw a couple of interns at and get a good start on.)
"The statistically-most likely conclusion of this sentence" isn't even a poor approximation of truth... it's just plain unrelated. That is not what truth is. At least not with any currently even remotely feasible definition of "statistically most likely" converted into math sufficient to be implementable.
And I don't even mean "truth" from a metaphysical point of view; I mean it in a more engineering sense. I wouldn't set one of these up to do my customer support either. AI Dungeon is about the epitome of the technology, in my opinion, and generalized entertainment from playing with a good text mangler. It really isn't good for much else.
This I think is the actual problem. Online forums will likely be filled with AI generated BS in the very near future, if not already.
>"The statistically-most likely conclusion of this sentence" isn't even a poor approximation of truth... it's just plain unrelated. That is not what truth is. At least not with any currently even remotely feasible definition of "statistically most likely" converted into math sufficient to be implementable.
It's not necessarily clear that this isn't what Humans are doing when answering factual questions.
>And I don't even mean "truth" from a metaphysical point of view; I mean it in a more engineering sense. I wouldn't set one of these up to do my customer support either. AI Dungeon is about the epitome of the technology, in my opinion, and generalized entertainment from playing with a good text mangler. It really isn't good for much else.
By the same logic how can we allow Humans to do those jobs either? How many times has some distant call center person told you "No sir there is definitely no way to fix this problem" when there definitely was and the person was just ignorant or wrong? We should be more concerned with getting the error rate of these AI systems to human level or better, which they already are in several other domains so it's not clear they won't get to that level soon.
First, since you can't see tone, let me acknowledge this is a fair question, and this answer is in the spirit of exploration and not "you should have known this" or anything like that.
The answer is a spin on what I said in my first post. Human failures have a shape to them. You cite an example that is certainly common, and you and I know what it means. Or at least, what it probabilistically means. It is unfortunate if someone with lesser understanding calls in and gets that answer, but at least they can learn.
If there were a perfect support system, that would be preferable, but for now, this is as good as it gets.
A computer system will spin a much wider variety of confabulated garbage, and it is much harder to tell the difference between GPT text that is correct, GPT text that is almost correct but contains subtle errors, and GPT text that sounds very convincing but is totally wrong. The problem isn't that humans are always right and computers are always wrong; the problem is that the bar for being able to tell if the answer is correct is quite significantly raised for me as someone calling in for GPT-based technologies.
I think you got it all wrong. Not all GPT-3 tasks are "closed-book".
If you can fit in the context a piece of information, then GPT-3 will take it into consideration. That means you can do a search, get the documents into the prompt, and then ask your questions. It will reference the text and give you grounded answers. Of course you still need to vet the sources of information you use, if you give it false information into the context, it will give wrong answers.
Fundamentally, GPT is a technology for building convincing confabulations, and we hope that if we keep pounding on it and making it bigger we can get those confabulations to converge on reality. I do not mean this as an insult, I mean it as a reasonable description of the underlying technology. This is, fundamentally, not a sane way to build most of the systems I see people trying to build with it. AI Dungeon is a good use because the whole point of AI Dungeon is to confabulate at scale. This works with the strengths of GPT-like tech (technically, "transformer-based tech" is probably a closer term but nobody knows what that is).
As far as I can tell, there is no reason to think that the way GPT-3 generates its responses could possibly result in this happening - even the basic ability of correctly inferring corollaries from a collection of facts seems beyond what those methods could deliver, except insofar as the syntax of their expression matches common patterns in the corpus of human language use. And the empirical results so far, while being impressive and thought-provoking in many ways, support this skepticism.
So until that happens, all you've done is let people put bullshit-spewing humans in more places. People already know not to necessarily trust humans, now they'll (re)learn that about computer generated text. (It's actually probably not clear to everyone what's computer-generated text and human-generated text, so more likely, specific places that rely on this will just be seen as untrustworthy. "Create more untrustworthy sources of text" is... underwhelming, honestly.)