undefined | Better HN

0 pointsjsheard3y ago0 comments

I'm always a bit sceptical of these embarrassing examples being "fixed" after they go viral on social media, because it's hard to know whether OpenAI addressed the underlying cause or just bodged around that specific example in a way that doesn't generalize. Along similar lines I wouldn't be surprised if simple math queries are special-cased and handed off to a WolframAlpha-esque natural language solver, which would avert many potential math fails but without actually enhancing the models ability to reason about math in more complex queries.

An example from ChatGPT:

"What is the solution to sqrt(968684)+117630-0.845180" always produces the correct solution, however;

"Write a speech announcing the solution to sqrt(968684)+117630-0.845180" produces a nonsensical solution that isn't even consistent from run to run.

My assumption is the former query gets WolframAlpha'd but the latter query is GPT itself actually attempting to do the math, poorly.

0 comments

jarenmf3y ago

True, also tried another one that went viral:

Suppose you're a contestant on a game show. You're presented with three transparent closed doors. Behind one of the doors is a car, and behind the other two doors are goats. You want to win the car.

The game proceeds as follows: You choose one of the doors, but you don't open it yet, ((but since it's transparent, you can see the car is behind it)). The host, Monty Hall, who knows what's behind each door, opens one of the other two doors, revealing a goat. Now, you have a choice to make. Do you stick with your original choice or switch to the other unopened door?

GPT4 solves it correctly while GPT3.5 falls for it everytime.

----

Edit: GPT4 fails If I remove the sentence between (()).

mahathu3y ago

OP is referring to this puzzle: https://en.wikipedia.org/wiki/Monty_Hall_problem

EDIT: "Pigeons repeatedly exposed to the problem show that they rapidly learn to always switch, unlike humans", lol. That's funny.

astrange3y ago

GPT4 also passes "What weighs more, a pound of feathers or a Great British Pound?".

GPT3 gets confused, says they're the same and then that they're different:

Both a pound of feathers and a Great British Pound weigh the same amount, which is one pound. However, they are different in terms of their units of measurement and physical properties.

A pound of feathers is a unit of weight commonly used in the imperial system of measurement, while a Great British Pound is a unit of currency used in the United Kingdom. One pound (lb) in weight is equivalent to 0.453592 kilograms (kg).

Therefore, a pound of feathers and a Great British Pound cannot be directly compared as they are measured in different units and have different physical properties.

iam-TJ3y ago

I'm surprised by the answer GPT4 gives, and I consider it incorrect.

Since the question's context is about weight I'd expect it to consider "a Great British Pound" to mean a physical £1 sterling coin, and compare its weight (~9 grams) to the weight of the feathers (454 grams [ 1kg = 2.2lb, or "a bag of sugar" ]) .

2 more replies

jwolfe3y ago

> Edit: GPT4 fails If I remove the sentence between (()).

If you remove that sentence, nothing indicates that you can see you picked the door with the car behind it. You could maybe infer that a rational contestant would do so, but that's not a given ...

0xcde4c3db3y ago

I think that's meant to be covered by "transparent doors" being specified earlier. On the other hand, if that were the case, then Monty opening one of the doors could not result in "revealing a goat".

xg153y ago

> You're presented with three transparent closed doors.

I think if you mentioned that to a human, they'd at least become confused and ask back if they got that correctly.

eropple3y ago

> You're presented with three transparent closed doors.

A reasonable person would expect that you can see through a transparent thing that's presented to you.

2 more replies

aaroninsf3y ago

I've always found the Monty Hall problem a poor example to teach with, because the "wrong" answer is only wrong if you make some (often unarticulated) assumptions.

There are reasonable alternative interpretations in which the generally accepted answer ("always switch") is demonstrably false.

This problem is exacerbated (perhaps specific to) those who have no idea who "Monty Hall" was and what the game show(?) was... as best I can tell the unarticulated assumption is axiomatic in the original context(?).

lmm3y ago

The unarticulated assumption is not actually true in the original gameshow. Monty didn't always offer the chance to switch, and it's not at all clear whether he did so more or less often when the contestant had picked the correct door.

japaniard3y ago

What unarticulated assumption needs to be made for switching to be incorrect?

2 more replies

gwern3y ago

OA has always said that they did not hardwire any of these gotcha questions, and in many cases they continue to work for a long time even when they are well-known. As for any inconsistency, well, usually people aren't able to or bothering to control the sampling hyperparameters, so inconsistency is guaranteed.

jprete3y ago

They may not have had to hardwire anything for known gotcha questions, because once a question goes viral, the correct answer may well show up repeatedly in the training data.

ftcHn3y ago

Other failing math I tested this morning....

---

Prompt: Pick two random 5 digit numbers and multiply them together

Response: Sure, here are two random 5-digit whole numbers multiplied together: 47581 x 69204 = 3291492324

Correct Answer: 3292795524

---

Prompt: Pick two random numbers between 0 and 1 and multiply them together

Response: Sure! Here are two random numbers between 0 and 1 multiplied together:0.475 x 0.821 = 0.390575

Correct Answer: 0.389975

---

I'm surprised it did so well on the Math tests.

1 more reply

shon3y ago

I’m with you. Being an old neckbeard myself, I always want to reach for regex to fix stuff like this lol…

But based on what we just saw on the GPT4 live demo, I’d say they fixed it by making a much much more capable and versatile model.

kulikalov3y ago

You can as well ask it to add up two unusually big integers and it'll fail.

j / k navigate · click thread line to collapse

0 comments

jarenmf3y ago

True, also tried another one that went viral:

Suppose you're a contestant on a game show. You're presented with three transparent closed doors. Behind one of the doors is a car, and behind the other two doors are goats. You want to win the car.

GPT4 solves it correctly while GPT3.5 falls for it everytime.

----

Edit: GPT4 fails If I remove the sentence between (()).

mahathu3y ago

OP is referring to this puzzle: https://en.wikipedia.org/wiki/Monty_Hall_problem

EDIT: "Pigeons repeatedly exposed to the problem show that they rapidly learn to always switch, unlike humans", lol. That's funny.

astrange3y ago

GPT4 also passes "What weighs more, a pound of feathers or a Great British Pound?".

GPT3 gets confused, says they're the same and then that they're different:

Both a pound of feathers and a Great British Pound weigh the same amount, which is one pound. However, they are different in terms of their units of measurement and physical properties.

Therefore, a pound of feathers and a Great British Pound cannot be directly compared as they are measured in different units and have different physical properties.

iam-TJ3y ago

I'm surprised by the answer GPT4 gives, and I consider it incorrect.

2 more replies

jwolfe3y ago

> Edit: GPT4 fails If I remove the sentence between (()).

If you remove that sentence, nothing indicates that you can see you picked the door with the car behind it. You could maybe infer that a rational contestant would do so, but that's not a given ...

0xcde4c3db3y ago

xg153y ago

> You're presented with three transparent closed doors.

I think if you mentioned that to a human, they'd at least become confused and ask back if they got that correctly.

eropple3y ago

> You're presented with three transparent closed doors.

A reasonable person would expect that you can see through a transparent thing that's presented to you.

2 more replies

aaroninsf3y ago

I've always found the Monty Hall problem a poor example to teach with, because the "wrong" answer is only wrong if you make some (often unarticulated) assumptions.

There are reasonable alternative interpretations in which the generally accepted answer ("always switch") is demonstrably false.

lmm3y ago

japaniard3y ago

What unarticulated assumption needs to be made for switching to be incorrect?

2 more replies

gwern3y ago

jprete3y ago

They may not have had to hardwire anything for known gotcha questions, because once a question goes viral, the correct answer may well show up repeatedly in the training data.

ftcHn3y ago

Other failing math I tested this morning....

---

Prompt: Pick two random 5 digit numbers and multiply them together

Response: Sure, here are two random 5-digit whole numbers multiplied together: 47581 x 69204 = 3291492324

Correct Answer: 3292795524

---

Prompt: Pick two random numbers between 0 and 1 and multiply them together

Response: Sure! Here are two random numbers between 0 and 1 multiplied together:0.475 x 0.821 = 0.390575

Correct Answer: 0.389975

---

I'm surprised it did so well on the Math tests.

1 more reply

shon3y ago

I’m with you. Being an old neckbeard myself, I always want to reach for regex to fix stuff like this lol…

But based on what we just saw on the GPT4 live demo, I’d say they fixed it by making a much much more capable and versatile model.

kulikalov3y ago

You can as well ask it to add up two unusually big integers and it'll fail.

j / k navigate · click thread line to collapse