undefined | Better HN

0 pointsriffraff3y ago0 comments

I think this is the culprit: ChatGPT does not understand, as proven by hallucinations, by being wrong on certain questions or puzzle etc..

It just seems to understand. This is useful, and deeply impressive, but it's not the same thing.

0 comments

vidarh3y ago

Humans are confidently wrong about all kinds of things all the time. We don't call it hallucination other than for very specific, limited subsets. We call each other names over it, call it mistakes, stupidity, or lies. In fact, we structure large parts of society around it (the presence of multiple contradictory world religions means the majority of the worlds population goes through life being confidently wrong). It's just being confidently wrong about some subjects is more socially acceptable than others.

ChatGPT clearly needs reinforcement to know its limits the way humans are constantly reinforced from childhood that making confident claims about things we don't actually know well is often going to have negative consequences, but until we've seen the result of doing that, I don't think we have any foundation to say anything about whether ChatGPT's current willingness to make up answers means it's level of understanding is in any way fundamentally different from that of humans or not.

sebastiennight3y ago

I think that the ChatGPT model (at least chatgpt-3.5-turbo) has gotten impressively better at this.

- In my most recent tests, it will tell you when the data you've provided doesn't match the task (instead of inventing an answer out of thin air).

- It will also add (unprompted) comments/notes before or after the result to disclaim a plausible reason why certain choices have been made or the answer isn't complete.

You have to take into account that not everyone wants the model to not hallucinate. There is a lot of competing pressure:

- Some people would like the model to say "As an AI model trained by OpenAI, I am not qualified to provide an answer because this data is not part of my training set" or something similar, because they want it to only talk when it's sure of the data. (I personally think this use case - using LLMs as search engines/databases of truth - is deeply flawed and not what LLMs are for ; but for a large enough GPT-n, it would work perfectly fine. There is a model size where the model would indeed contain the entire uncompressed Internet, after all)

- Some people want the model to never give such a denial, and always provide an answer in the require format, even if it requires the model to "bullshit" or improvise a bit. An example is, if that as a business user I provide the model with a blog article and ask for metadata in a JSON structure, I want the model to NEVER return "As an AI model..." and ALWAYS return valid JSON, even if the metadata is somewhat shaky or faulty. Most apps are more tolerant to BS than they are to empty/invalid responses. That's the whole reason behind all of those "don't reply out of character/say you are an AI" prompts you see floating around (which, in my experience, are completely useless and do not affect the result one bit)

So the reinforcement is constantly going in those opposite directions.

vidarh3y ago

I agree it's better, but it's still awful at it. But I think it's one of the low hanging fruits to improve, though, just by reinforcing the limits of what it knows.

With respect to the competing draws here, I'm not sure they necessarily compete that much. E.g. being able to ask it to speculate but explain justifications or being able to provide a best guess or being able to ask it to just be as creative as it wants, would be sufficient. Alternatively one that knows how to embed indication of what it knows to be true vs. what is speculation or pure fiction. Of course we can "just" also train different models with different levels of / types of reinforced limits. Having both a "anything goes" model and a constrained version that knows what it does know might both be useful for different things.

pilsetnieks3y ago

You seem to propose a "if it quacks like a duck" scenario here (aka the Chinese Room), and applying it to people who are mistaken or acting on incorrect information.

Those people are reasoning, however rightly or wrongly, based on that wrong information. LLMs are not.

vidarh3y ago

We do not have sufficient knowledge of what reasoning entails to be able to tell whether there's any meaningful distinction between human reasoning and LLMs. It's likely there is some difference. It's not clear whether there's any major qualitative difference.

Claiming we can tell that there's a distinction that merits saying people are reasoning and LLMs are not, is "hallucination" to me. It's making a claim there is insufficient evidence to make a reasoned statement about.

EDIT: Ironically, on feeding ChatGPT (w/GPT4) my comment and your reply and asking it to "compose a reply on behalf of 'vidarh'" it produced a reply that was far more willing to accept your claim that there is a fundamental difference (while otherwise giving a reasonable reaffirmation of my argument that reinforcement of the boundaries of its knowledge would reduce its "hallucinations")

1 more reply

riffraffOP3y ago

> Humans are confidently wrong about all kinds of things all the time

there is a qualitative difference: humans may be wrong about facts because they think they are true, ChatGPT is wrong because it does not know what anything means. You cannot fix that, because it's just the way LLMs work.

For example, if asked about a URL for something, a human may remember it wrongly, but will in general say "I don't know, let me check", while ChatGPT will just spew something.

beanbean013y ago

THANK YOU. You've stated the difference perfectly. Humans have a concept of truth or grounding, LLMs do not.

iainmerrick3y ago

humans are constantly reinforced from childhood that making confident claims about things we don't actually know well is often going to have negative consequences

If only that were universally true!

vidarh3y ago

Well, we are, but some better than others, and most of us will still confidently make false claims more often than we should.

bondarchuk3y ago

You seem to be saying that because it misunderstands certain things, everything else that is evidence of correct understanding also doesn't count as real "understanding". I don't see how this is a coherent position to hold, as we can just as easily apply it to humans who have misunderstood something.

Edit: and I also don't see why it has to be so black-and-white. IMHO there is no problem with saying it understands certain things, and doesn't understand other things. We are talking about general intelligence, not god-like omniscience.

bsaul3y ago

"it just seems to understand" :

Does stockfish "understand" chess, or does it just "seem to understand" chess ?

For all practical purposes and intent, this doesn't make much difference.

riffraffOP3y ago

yes, for all practical purposes stockfish understands chess. For most practical purposes ChatGPT understands how to string tokens together.

But in AGI, G is the important bit, and neither Stockfish nor ChatGPT have demonstrated general understanding of the world.

bsaul3y ago

i think you're conflating being self-conscious, and being intelligent. In AGI , A is also very important. A universal oracle stuck in a machine, able to correctly predict anything and answer all questions, but still having no "desire" or "will" or consciousness, would be an AGI ( imho )

1 more reply

j / k navigate · click thread line to collapse

0 comments

vidarh3y ago

sebastiennight3y ago

I think that the ChatGPT model (at least chatgpt-3.5-turbo) has gotten impressively better at this.

- In my most recent tests, it will tell you when the data you've provided doesn't match the task (instead of inventing an answer out of thin air).

- It will also add (unprompted) comments/notes before or after the result to disclaim a plausible reason why certain choices have been made or the answer isn't complete.

You have to take into account that not everyone wants the model to not hallucinate. There is a lot of competing pressure:

So the reinforcement is constantly going in those opposite directions.

vidarh3y ago

I agree it's better, but it's still awful at it. But I think it's one of the low hanging fruits to improve, though, just by reinforcing the limits of what it knows.

pilsetnieks3y ago

You seem to propose a "if it quacks like a duck" scenario here (aka the Chinese Room), and applying it to people who are mistaken or acting on incorrect information.

Those people are reasoning, however rightly or wrongly, based on that wrong information. LLMs are not.

vidarh3y ago

1 more reply

riffraffOP3y ago

> Humans are confidently wrong about all kinds of things all the time

For example, if asked about a URL for something, a human may remember it wrongly, but will in general say "I don't know, let me check", while ChatGPT will just spew something.

beanbean013y ago

THANK YOU. You've stated the difference perfectly. Humans have a concept of truth or grounding, LLMs do not.

iainmerrick3y ago

humans are constantly reinforced from childhood that making confident claims about things we don't actually know well is often going to have negative consequences

If only that were universally true!

vidarh3y ago

Well, we are, but some better than others, and most of us will still confidently make false claims more often than we should.

bondarchuk3y ago

bsaul3y ago

"it just seems to understand" :

Does stockfish "understand" chess, or does it just "seem to understand" chess ?

For all practical purposes and intent, this doesn't make much difference.

riffraffOP3y ago

yes, for all practical purposes stockfish understands chess. For most practical purposes ChatGPT understands how to string tokens together.

But in AGI, G is the important bit, and neither Stockfish nor ChatGPT have demonstrated general understanding of the world.

bsaul3y ago

1 more reply

j / k navigate · click thread line to collapse