undefined | Better HN

0 pointsnopinsight3y ago0 comments

Although GPT-4 scores excellently in tests involving crystallized intelligence, it still struggles with tests requiring fluid intelligence like competitive programming (Codeforces), Leetcode (hard), and AMC. (Developers and mathematicians are still needed for now).

I think we will probably get (non-physical) AGI when the models can solve these as well. The implications of AGI might be much bigger than the loss of knowledge worker jobs.

Remember what happened to the chimps when a smarter-than-chimpanzee species multiplied and dominated the world.

0 comments

Scarblac3y ago

Of course 99.9% of humans also struggle with competitive programming. It seems to be an overly high bar for AGI if it has to compete with experts from every single field.

That said, GPT has no model of the world. It has no concept of how true the text it is generating is. Its going to be hard for me to think of that as AGI.

sebzim45003y ago

>That said, GPT has no model of the world.

I don't think this is necessarily true. Here is an example where researchers trained a transformer to generate legal sequences of moves in the board game Othello. Then they demonstrated that the internal state of the model did, in fact, have a representation of the board.

https://arxiv.org/abs/2210.13382

gowld3y ago

That's a GPT and it's specific for one dataset of one game. How would someone extend that to all games and all other fields of human endeavor?

1 more reply

nopinsightOP3y ago

Even the current GPT has models of the domains it was trained on. That is why it can solve unseen problems within those domains. What it lacks is the ability to generalize beyond the domains. (And I did not suggest it was an AGI.)

If an LLM can solve Codeforces problems as well as a strong competitor—-in my hypothetical future LLM—-what else can it not do as well as competent humans (aside from physical tasks)?

sterlind3y ago

it's an overly high bar, but it seems well on its way to competing with experts from every field. it's terrifying.

and I'm not so sure it has no model of the world. a textual model, sure, but considering it can recognize what svgs are pictures of from the coordinates alone, that's not much of a limitation maybe.

PaulDavisThe1st3y ago

> well on its way to competing with experts from every field

competing with them at what, precisely?

CuriouslyC3y ago

We don't have to worry so much about that. I think the most likely "loss of control" scenario is that the AI becomes a benevolent caretaker, who "loves" us but views us as too dim to properly take care of ourselves, and thus curtails our freedom "for our own good."

We're still a very very long way from machines being more generally capable and efficient than biological systems, so even an oppressive AI will want to keep us around as a partner for tasks that aren't well suited to machines. Since people work better and are less destructive when they aren't angry and oppressed, the machine will almost certainly be smart enough to veil its oppression, and not squeeze too hard. Ironically, an "oppressive" AI might actually treat people better than Republican politicians.

impossiblefork3y ago

Things like that probably require some kind of thinking ahead, which models of things kind kind of can't do-- something like beam search.

Language models that utilise beam search can calculate integrals ('Deep learning for symbolic mathematics', Lample, Charton, 2019, https://openreview.net/forum?id=S1eZYeHFDS), but without it it doesn't work.

However, beam search makes bad language models. I got linked this paper ('Locally typical sampling' https://arxiv.org/pdf/2202.00666.pdf) when I asked some people why beam search only works for the kind of stuff above. I haven't fully digested it though.

adgjlsfhk13y ago

It's AMC-12 scores aren't awful. It's at roughly 50th percentile for AMC which (given who takes the AMC) probably puts it in the top 5% or so of high school students in math ability. It's AMC 10 score being dramatically lower is pretty bad though...

gowld3y ago

> It's AMC-12 scores aren't awful.

A blank test scores 37.5

The best score 60 is 5 correct answers + 20 blank answers; or 6 correct, 4 correct random guesses, and 15 incorrect random guesses. (20% chance of correct guess)

The 5 easiest questions are relatively simple calculations, once the parsing task is achieved.

(Example: https://artofproblemsolving.com/wiki/index.php/2022_AMC_12A_... ) so the main factor in that score is how good GPT is at refusing to answer a question, or doing a bit better to overcome the guessing penalty.

> It's AMC 10 score being dramatically lower is pretty bad though...

All versions (scoring 30, 36) It scored worse than leaving the test blank.

The only explanation I can imagine for that is that it can't understand diagrams.

It's also unclear if the AMC performance is based on Englush or the computer-encoded version from this benchmark set: https://arxiv.org/pdf/2109.00110.pdf https://openai.com/research/formal-math

AMC/AIME and even to some extent USAMO/IMO problems are hard for humans because they are time-limited and closed-book. But they aren't conceptually hard -- they are solved by applying a subset of known set of theorems a few times to the input data.

The hard part of math, for humans, is ingesting data into their brains, retaining it, and searching it. Humans are bad a memorizing large databases of symbolic data, but that's trivial for a large computer system.

An AI system has a comprehensive library, and high-speech search algorithms.

Can someone who pays $20/month please post some sample AMC10/AMC12 Q&A?

scotty793y ago

I wonder why gpt is so bad at AP English Literature

1attice3y ago

wouldn't it be funny if knowledge workers could all be automated, except for English majors?

The Revenge of the Call Centre

atemerev3y ago

I am not a species chauvinist. 1) Unless a biotech miracle happen, which is unlikely, we are all going to die anyway; 2) If an AI will continue life and research and will increase complexity after humans, what is the difference?

j / k navigate · click thread line to collapse

0 comments

Scarblac3y ago

Of course 99.9% of humans also struggle with competitive programming. It seems to be an overly high bar for AGI if it has to compete with experts from every single field.

That said, GPT has no model of the world. It has no concept of how true the text it is generating is. Its going to be hard for me to think of that as AGI.

sebzim45003y ago

>That said, GPT has no model of the world.

https://arxiv.org/abs/2210.13382

gowld3y ago

That's a GPT and it's specific for one dataset of one game. How would someone extend that to all games and all other fields of human endeavor?

1 more reply

nopinsightOP3y ago

If an LLM can solve Codeforces problems as well as a strong competitor—-in my hypothetical future LLM—-what else can it not do as well as competent humans (aside from physical tasks)?

sterlind3y ago

it's an overly high bar, but it seems well on its way to competing with experts from every field. it's terrifying.

and I'm not so sure it has no model of the world. a textual model, sure, but considering it can recognize what svgs are pictures of from the coordinates alone, that's not much of a limitation maybe.

PaulDavisThe1st3y ago

> well on its way to competing with experts from every field

competing with them at what, precisely?

CuriouslyC3y ago

impossiblefork3y ago

Things like that probably require some kind of thinking ahead, which models of things kind kind of can't do-- something like beam search.

adgjlsfhk13y ago

gowld3y ago

> It's AMC-12 scores aren't awful.

A blank test scores 37.5

The best score 60 is 5 correct answers + 20 blank answers; or 6 correct, 4 correct random guesses, and 15 incorrect random guesses. (20% chance of correct guess)

The 5 easiest questions are relatively simple calculations, once the parsing task is achieved.

> It's AMC 10 score being dramatically lower is pretty bad though...

All versions (scoring 30, 36) It scored worse than leaving the test blank.

The only explanation I can imagine for that is that it can't understand diagrams.

It's also unclear if the AMC performance is based on Englush or the computer-encoded version from this benchmark set: https://arxiv.org/pdf/2109.00110.pdf https://openai.com/research/formal-math

An AI system has a comprehensive library, and high-speech search algorithms.

Can someone who pays $20/month please post some sample AMC10/AMC12 Q&A?

scotty793y ago

I wonder why gpt is so bad at AP English Literature

1attice3y ago

wouldn't it be funny if knowledge workers could all be automated, except for English majors?

The Revenge of the Call Centre

atemerev3y ago

j / k navigate · click thread line to collapse