undefined | Better HN

0 pointsharshreality3y ago0 comments

It can do some trivial problems, but not others, and it can't consistently do real math (proving theorems), because it doesn't understand logic. It can only imitate it some of the time for trivial or often-presented proofs when the necessary steps have become embedded in its next-token-generation model.

It's still getting this wrong, for instance:

Prompt: You call heads/tails for 1000 coin tosses. You know the coin being used is weighted so that it comes up heads 52% of the time. Each correct coin toss prediction earns you $5. What should your strategy be, for calling heads/tails, in order to maximize how much money you receive?

ChatGPT: To maximize your money, you should call heads 52% of the time and tails 48% of the time, as those are the probabilities of getting heads and tails, respectively. This will give you an expected payout of $5 * 0.52 = $2.60 per head and $5 * 0.48 = $2.40 per tail, which is the highest expected value given the information about the coin being weighted.

It's getting closer to being able to prove that the square root of 2 is irrational, but it still flubs a step.

Prompt: Prove that the square root of 2 is irrational.

ChatGPT: ... This means that p can be expressed as 2r for some integer r. Substituting this into the equation above gives us:

ChatGPT: 2q^2 = (2r)^2 = 4r^2

ChatGPT: Dividing both sides by 4 gives us:

ChatGPT: q^2 = r^2

ChatGPT: So q must also be even, which means that both p and q have a common factor of 2...

It failed basic algebraic manipulation, but keeps going with the right reasoning simply because that's what it's been trained on, not because it's reflecting on the equation it just came up with. That's not math. That's mimicry.

Even if it's seen something many times in training, its accuracy gets worse with longer trains of "thought", as expected because a language model like that has ADHD. It can't focus enough on one logical thread, and other "knowledge" leaks in. Ask it to derive the quadratic formula by completing the square, which requires a lot of steps with accurate algebraic manipulation. It goes through some irrelevant algebraic steps, sometimes not even getting the algebraic manipulation correct, and finally (sometimes, depending on how you ask) manufactures the correct answer out of thin air as the last step. I finally got it to answer correctly, by prompting it with the initial equation and that the first step is to divide both sides by a, but it still combined a few steps, not going methodically a single step at a time as prompted.

0 comments

aembleton3y ago

Seems to have improved already.

Just tried the coin toss prompt and get the following

Your strategy should be to always call "heads," since it has a higher probability of landing on heads (52%) compared to tails (48%). By consistently choosing heads, you'll earn more money in the long run.

And to prove taht the square root of 2 is irrational:

The proof that the square root of 2 is irrational is a well-known proof that uses contradiction. Here's one way it can be shown:

Suppose, for the sake of contradiction, that the square root of 2 is rational, meaning that it can be expressed as a fraction of two integers, a/b (where b≠0). Then, we can square both sides of the equation a/b = √2, which gives us:

a^2/b^2 = 2

Since a/b is in its lowest form (the numerator and denominator have no common factors other than 1), it follows that a^2 and b^2 have no common factors other than 1. But then, 2 must also be a perfect square, which it is not. Therefore, our assumption that the square root of 2 is rational leads to a contradiction, and we must conclude that it is indeed an irrational number.

alexanderchr3y ago

This "proof" is actually a pretty good manifestation of the current limitations of GPT3; it mangles together two different proofs in a way that does not make sense.

It starts by setting up for an elementary proof (a/b = sqrt(2) with a, b minimal implies 2|a^2 implies 2|a implies 4|a^2 implies 2|b^2 implies 2|b which is a contradiction). But instead of following through it finishes using another proof based on the rational root theorem, and makes up some bullshit on the way to make the argument flow nicely (it does not follow from a^2 / b^2 = 2 with no common factors that 2 is a perfect square).

j / k navigate · click thread line to collapse