undefined | Better HN

0 pointswhy_only_151y ago0 comments

how is this a plateau since gpt-4? this is significantly better

0 comments

First, this model is yet to be released. This is a momentum "announcement". When the O1 was "announced", it was announced as a "breakthrough" but I use Claude/O1 daily and 80% of the time Claude beats it. I also see it as a highly fine-tuned/targeted GPT-4 rather than something that has complex understanding.

So we'll find out if this model is real or not by 2-3 months. My guess is that it'll turn out to be another flop like O1. They needed to release something big because they are momentum based and their ability to raise funding is contingent on their AGI claims.

XenophileJKO1y ago

I thought o1 was a fine-tune of GPT-4o. I don't think o3 is though. Likely using the same techniques on what would have been the "GPT-5" base model.

crazylogger1y ago

Intelligence has not been LLM's major limiting factor since GPT4. The original GPT4 reports in late-2022 & 2023 already established that it's well beyond an average human in professional fields: https://www.microsoft.com/en-us/research/publication/sparks-.... They failed to outright replaced humans at work not because of lacking intelligence.

We may have progressed from a 99%-accurate chatbot to one that's 99.9%-accurate, and you'd have a hard time telling them apart in normal real world (dumb) applications. A paradigm shift is needed from the current chatbot interface to a long-lived stream of consciousness model (e.g. a brain that constantly reads input and produces thoughts at 10ms refresh rate; remembers events for years and keep the context window from exploding; paired with a cerebellum to drive robot motors, at even higher refresh rates.)

As long as we're stuck at chatbots, LLM's impact on the real world will be very limited, regardless of how intelligent they become.

peepeepoopoo971y ago

O3 is multiple orders of magnitude more expensive to realize a marginal performance gain. You could hire 50 full time PhDs for the cost of using O3. You're witnessing the blowoff top of the scaling hype bubble.

whynotminot1y ago

What they’ve proven here is that it can be done.

Now they just have to make it cheap.

Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

Are you seriously going to assume that won’t happen here?

YeGoblynQueenne1y ago

>> Now they just have to make it cheap.

Like they've been making it all this time? Cheaper and cheaper? Less data, less compute, fewer parameters, but the same, or improved performance? Not what we can observe.

>> Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

No, actually the cheaper compute gets the more of it they need to use or their progress stalls.

1 more reply

Jensson1y ago

> What they’ve proven here is that it can be done.

No they haven't, these results do not generalize, as mentioned in the article:

"Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute"

Meaning, they haven't solved AGI, and the task itself do not represent programming well, these model do not perform that well on engineering benchmarks.

1 more reply

peepeepoopoo971y ago

Yes, that's exactly what I'm implying, otherwise they would have done it a long time ago, given that the fundamental transformer architecture hasn't changed since 2017. This bubble is like watching first year CS students trying to brute force homework problems.

1 more reply

MVissers1y ago

I would agree if the cost of AI compute over performance hasn't been dropping by more than 90-99% per year since GPT3 launched.

This type of compute will be cheaper than Claude 3.5 within 2 years.

It's kinda nuts. Give these models tools to navigate and build on the internet and they'll be building companies and selling services.

fspeech1y ago

That's a very static view of the affairs. Once you have a master AI, at a minimum you can use it to train cheaper slightly less capable AIs. At the other end the master AI can train to become even smarter.

Bolwin1y ago

The high efficiency version got 75% at just $20/task. When you count the time to fill in the squares, that doesn't sound far off from what a skilled human would charge

kenjackson1y ago

People act as if GPT-4 came out 10 years ago.

Jensson1y ago

> how is this a plateau since gpt-4? this is significantly better

Significantly better at what? A benchmark? That isn't necessarily progress. Many report preferring gpt-4 to the newer o1 models with hidden text. Hidden text makes the model more reliable, but more reliable is bad if it is reliably wrong at something since then you can't ask it over and over to find what you want.

I don't feel it is significantly smarter, it is more like having the same dumb person spend more thinking than the model getting smarter.

j / k navigate · click thread line to collapse

0 comments

csomar1y ago

XenophileJKO1y ago

I thought o1 was a fine-tune of GPT-4o. I don't think o3 is though. Likely using the same techniques on what would have been the "GPT-5" base model.

crazylogger1y ago

As long as we're stuck at chatbots, LLM's impact on the real world will be very limited, regardless of how intelligent they become.

peepeepoopoo971y ago

whynotminot1y ago

What they’ve proven here is that it can be done.

Now they just have to make it cheap.

Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

Are you seriously going to assume that won’t happen here?

YeGoblynQueenne1y ago

>> Now they just have to make it cheap.

Like they've been making it all this time? Cheaper and cheaper? Less data, less compute, fewer parameters, but the same, or improved performance? Not what we can observe.

>> Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

No, actually the cheaper compute gets the more of it they need to use or their progress stalls.

1 more reply

Jensson1y ago

> What they’ve proven here is that it can be done.

No they haven't, these results do not generalize, as mentioned in the article:

"Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute"

Meaning, they haven't solved AGI, and the task itself do not represent programming well, these model do not perform that well on engineering benchmarks.

1 more reply

peepeepoopoo971y ago

1 more reply

MVissers1y ago

I would agree if the cost of AI compute over performance hasn't been dropping by more than 90-99% per year since GPT3 launched.

This type of compute will be cheaper than Claude 3.5 within 2 years.

It's kinda nuts. Give these models tools to navigate and build on the internet and they'll be building companies and selling services.

fspeech1y ago

Bolwin1y ago

The high efficiency version got 75% at just $20/task. When you count the time to fill in the squares, that doesn't sound far off from what a skilled human would charge

kenjackson1y ago

People act as if GPT-4 came out 10 years ago.

Jensson1y ago

> how is this a plateau since gpt-4? this is significantly better

I don't feel it is significantly smarter, it is more like having the same dumb person spend more thinking than the model getting smarter.

j / k navigate · click thread line to collapse