undefined | Better HN

0 pointspu_pe1mo ago0 comments

Out of all conceptual mistakes people make about LLMs, one that needs to die very fast is to assume that you can test what it "knows" by asking a question. This whole thread is people asking different models a question one time and reporting a particular answer, which is the mental model you would use for whether a person knows something or not.

0 comments

NicuCalcea1mo ago

It's not a conceptual mistake when that's what's being advertised.

The onus is on AI companies to provide the service they promised, for example, a team of PhDs in my pocket [1]. PhDs know things.

1: https://www.bbc.com/news/articles/cy5prvgw0r1o

ndriscoll1mo ago

I've found that to be accurate when asking it questions that require ~PhD level knowledge to answer. e.g. Gemini and ChatGPT both seem to be capable of answering questions I have as I work through a set of notes on algebraic geometry.

Its performance on riddles has always seemed mostly irrelevant to me. Want to know if models can program? Ask them to program, and give them access to a compiler (they can now).

Want to know if it can do PhD level questions? Ask it questions a PhD (or at least grad student) would ask it.

They also reflect the tone and knowledge of the user and question. Ask it about your cat's astrological sign and you get emojis and short sentences in list form. Ask it why large atoms are unstable and you get paragraphs with larger vocabulary. Use jargon and it becomes more of an expert. etc.

NicuCalcea1mo ago

I don't know about algebraic geometry, but AI is absolutely terrible at communications and social sciences. I know because I can tell when my postgraduate students use it.

1 more reply

losvedir1mo ago

No, you're the one anthropomorphizing here. What's shocking isn't that it "knows" something or not, but that it gets the answer wrong often. There are plenty of questions it will get right nearly every time.

pu_peOP1mo ago

In which way am I anthropomorphizing?

losvedir1mo ago

I guess I mean that you're projecting anthropomorphization. When I see people sharing examples that the model answered wrong, I'm not interpreting that they think it "didn't know" the answer. Rather, they're reproducing the error. Most simple questions the models will get right nearly every time, so showing a failure is useful data.

jamesnorden1mo ago

The classic "holding it wrong".

Maxion1mo ago

The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.

The question: > I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

The question is non-sensical. If the reason you want to go to the car wash is to help your buddy Joe wash his car you SHOULD walk. Nothing in the question reveals the reason for why you want to go to the car wash, or even that you want to go there or are asking for directions there.

drawfloat1mo ago

It explicitly says you want to wash your car in the statement. Maybe it's not just LLMs struggling with a fairly basic question...

TZubiri1mo ago

>I want to wash MY car

>you want to go to the car wash is to help your buddy Joe wash HIS car

nope, question is pretty clear, however I will grant that it's only a question that would come up when "testing" the AI rather than a question that might genuinely arise.

ninjagoo1mo ago

> The question is non-sensical.

Sure, from a pure logic perspective the second statement is not connected to the first sentence, so drawing logical conclusions isn't feasible.

In everyday human language though, the meaning is plain, and most people would get it right. Even paid versions of LLMs, being language machines, not logic machines, get it right in the average human sense.

As an aside, it's an interesting thought exercise to wonder how much the first ai winter resulted from going down the strict logic path vs the current probabilistic path.

j / k navigate · click thread line to collapse

0 comments

NicuCalcea1mo ago

It's not a conceptual mistake when that's what's being advertised.

The onus is on AI companies to provide the service they promised, for example, a team of PhDs in my pocket [1]. PhDs know things.

1: https://www.bbc.com/news/articles/cy5prvgw0r1o

ndriscoll1mo ago

Its performance on riddles has always seemed mostly irrelevant to me. Want to know if models can program? Ask them to program, and give them access to a compiler (they can now).

Want to know if it can do PhD level questions? Ask it questions a PhD (or at least grad student) would ask it.

NicuCalcea1mo ago

I don't know about algebraic geometry, but AI is absolutely terrible at communications and social sciences. I know because I can tell when my postgraduate students use it.

1 more reply

losvedir1mo ago

pu_peOP1mo ago

In which way am I anthropomorphizing?

losvedir1mo ago

jamesnorden1mo ago

The classic "holding it wrong".

Maxion1mo ago

The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.

The question: > I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

drawfloat1mo ago

It explicitly says you want to wash your car in the statement. Maybe it's not just LLMs struggling with a fairly basic question...

TZubiri1mo ago

>I want to wash MY car

>you want to go to the car wash is to help your buddy Joe wash HIS car

nope, question is pretty clear, however I will grant that it's only a question that would come up when "testing" the AI rather than a question that might genuinely arise.

ninjagoo1mo ago

> The question is non-sensical.

Sure, from a pure logic perspective the second statement is not connected to the first sentence, so drawing logical conclusions isn't feasible.

As an aside, it's an interesting thought exercise to wonder how much the first ai winter resulted from going down the strict logic path vs the current probabilistic path.

j / k navigate · click thread line to collapse