undefined | Better HN

0 pointsaccountnum1y ago0 comments

It literally is a riddle, just as the original one was, because it tries to use your expectations of the world against you. The entire point of the original, which a lot of people fell for, was to expose expectations of gender roles leading to a supposed contradiction that didn't exist.

You are now asking a modified question to a model that has seen the unmodified one millions of times. The model has an expectation of the answer, and the modified riddle uses that expectation to trick the model into seeing the question as something it isn't.

That's it. You can transform the problem into a slightly different variant and the model will trivially solve it.

0 comments

jfengel1y ago

Phrased as it is, it deliberately gives away the answer by using the pronoun "he" for the doctor. The original deliberately obfuscates it by avoiding pronouns.

So it doesn't take an understanding of gender roles, just grammar.

accountnumOP1y ago

My point isn't that the model falls for gender stereotypes, but that it falls for thinking that it needs to solve the unmodified riddle.

Humans fail at the original because they expect doctors to be male and miss crucial information because of that assumption. The model fails at the modification because it assumes that it is the unmodified riddle and misses crucial information because of that assumption.

In both cases, the trick is to subvert assumptions. To provoke the human or LLM into taking a reasoning shortcut that leads them astray.

You can construct arbitrary situations like this one, and the LLM will get it unless you deliberately try to confuse it by basing it on a well known variation with a different answer.

I mean, genuinely, do you believe that LLMs don't understand grammar? Have you ever interacted with one? Why not test that theory outside of adversarial examples that humans fall for as well?

grey-area1y ago

They don't understand basic math or basic logic, so I don't think they understand grammar either.

They do understand/know the most likely words to follow on from a given word, which makes them very good at constructing convincing, plausible sentences in a given language - those sentences may well be gibberish or provably incorrect though - usually not because again most sentences in the dataset make some sort of sense, but sometimes the facade slips and it is apparent the GAI has no understanding and no theory of mind or even a basic model of relations between concepts (mother/father/son).

It is actually remarkable how like human writing their output is given how it is done, but there is no model of the world which backs their generated text which is a fatal flaw - as this example demonstrates.

1 more reply

j / k navigate · click thread line to collapse