undefined | Better HN

0 pointsfidotron7mo ago0 comments

Going by the system card at: https://openai.com/index/gpt-5-system-card/

> GPT‑5 is a unified system . . .

> . . . with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt).

So that's not really a unified system then, it's just supposed to appear as if it is.

This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.

0 comments

hatthew7mo ago

I know this is just arguing semantics, but wouldn't you call it a unified system since it has a single interface that automatically interacts with different components? It's not a unified model, but it seems correct to call it a unified system.

fnordpiglet7mo ago

Altman et al have been discussing the many model interface in ChatGPT is confusing to users and they want to move to a unified system that exposes a model that routes based on the task rather than depending on users understanding how and when to do that. Presumably this is what they’ve been discussing for some time. I don’t know that was intended to mean they would be working toward some unified inference architecture and model, although I’m sure goal posts will be moved to ensure it’s insufficient.

tomalbrc7mo ago

Altman is a salesman.

3 more replies

awestroke7mo ago

No, Altman is not a researcher

1 more reply

sigmoid107mo ago

It's not a unified architecture transformer, but it is a unified system for chatting.

WorldPeas7mo ago

so openai is in the business of GPT wrappers now? I'm guessing their open model is an escape for those who wanted to have a "plain" model, though from my systematic testing, it's not much better than Kimi K2

erjiang7mo ago

The API lets you directly choose the model you want. Automatic thinking is a ChatGPT feature since ChatGPT has always been a “GPT wrapper” in that sense.

pertymcpert7mo ago

They build AI systems, not GPTs.

andai7mo ago

> While GPT‑5 in ChatGPT is a system of reasoning, non-reasoning, and router models, GPT‑5 in the API platform is the reasoning model that powers maximum performance in ChatGPT. Notably, GPT‑5 with minimal reasoning is a different model than the non-reasoning model in ChatGPT, and is better tuned for developers. The non-reasoning model used in ChatGPT is available as gpt-5-chat-latest.

https://openai.com/index/introducing-gpt-5-for-developers/

Therenas7mo ago

Too expensive maybe, or just not effective anymore as they used up any available training data. New data is generated slowly, and is massively poisoned with AI generated data, so it might be useless.

fidotronOP7mo ago

I think that possibility is worse, because it implies a fundamental limit as opposed to a self imposed restriction, and I choose to remain optimistic.

If OpenAI really are hitting the wall on being able to scale up overall then the AI bubble will burst sooner than many are expecting.

pillefitz7mo ago

LLMs alone might be powerful enough already, they just need to be hooked up to classic AI systems to enable symbolic reasoning, episodic memory etc.

ACCount367mo ago

That's a lie people repeat because they want it to be true.

People evaluate dataset quality over time. There's no evidence that datasets from 2022 onwards perform any worse than ones from before 2022. There is some weak evidence of an opposite effect, causes unknown.

It's easy to make "model collapse" happen in lab conditions - but in real world circumstances, it fails to materialize.

noosphr7mo ago

>This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.

The corollary to the bitter lesson strikes again: any hand crafted system will out perform any general system for the same budget by a wide margin.

fidotronOP7mo ago

That is, at best, wishful thinking.

In practice the whole point is the opposite is the case, which is why this direction by OpenAI is a suspicious indicator.

lacoolj7mo ago

Many tiny, specialized models is the way to go, and if that's what they're doing then it's a good thing.

fidotronOP7mo ago

Not at all, you will simply rediscover the bitter lesson [1] from your new composition of models.

[1] https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

bigmadshoe7mo ago

The bitter lesson doesn't say that you can't split your solution into multiple models. It says that learning from more data via scaled compute will outperform humans injecting their own assumptions about the task into models.

A broad generalization like "there are two systems of thinking: fast, and slow" doesn't necessarily fall into this category. The transformer itself (plus the choice of positional encoding etc.) contains inductive biases about modeling sequences. The router is presumably still learned with a fairly generic architecture.

1 more reply

legulere7mo ago

> The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation

Is it though? To me it seems like performance gains are slowing down and additional computation in AI comes mostly from insane amounts of money thrown at it.

noosphr7mo ago

Yes, custom hand crafted model will always outperform general statistical models when given the same compute budget. Given that we've basically saturated the power grid at this point we may have to do the unthinkable and start thinking again.

chaos_emergent7mo ago

Au contraire, ANNs are precisely the decomposition of larger problems into smaller ones.

gekoxyz7mo ago

We already did this for Object/Face recognition, it works but it's not the way to go. It's the way to go only if you don't have enough compute power (and data, I suspect) for a E2E network

sixo7mo ago

No, it's what you do if your model architecture is capped out on its ability to profit from further training. Hand-wrapping a bunch of sub-models stands in for models that can learn that kind of substructure directly.

TheOtherHobbes7mo ago

It's a concept of a unified system.

bjornsing7mo ago

You could train that architecture end-to-end though. You just have to run both models and backprop through both of them in training. Sort of like mixture of experts but with two very different experts.

dang7mo ago

Related ongoing thread:

GPT-5 System Card [pdf] - https://news.ycombinator.com/item?id=44827046

illiac7867mo ago

I do agree that the current evolution is moving further and further away from AGI, and more toward a spectrum of niche/specialisation.

It feels less and less likely AGI is even possible with the data we have available. The one unknown is if we manage to get usable quantum computers, what that will do to AI, I am curious.

FeepingCreature7mo ago

If(f) it's trained end to end, it's a unified system.

mafro7mo ago

This is a precursor to a future model which isn't simply a router.

From the system card:

"In the near future, we plan to integrate these capabilities into a single model."

Icathian7mo ago

Anyone who still takes predictive statements from leadership at AI companies as anything other than meaningless noise isn't even trying.

kgwgk7mo ago

You don't get it. They couldn't do it yet because it would be too powerful and kill us all!

j / k navigate · click thread line to collapse

0 comments

hatthew7mo ago

fnordpiglet7mo ago

tomalbrc7mo ago

Altman is a salesman.

3 more replies

awestroke7mo ago

No, Altman is not a researcher

1 more reply

sigmoid107mo ago

It's not a unified architecture transformer, but it is a unified system for chatting.

WorldPeas7mo ago

erjiang7mo ago

The API lets you directly choose the model you want. Automatic thinking is a ChatGPT feature since ChatGPT has always been a “GPT wrapper” in that sense.

pertymcpert7mo ago

They build AI systems, not GPTs.

andai7mo ago

https://openai.com/index/introducing-gpt-5-for-developers/

Therenas7mo ago

Too expensive maybe, or just not effective anymore as they used up any available training data. New data is generated slowly, and is massively poisoned with AI generated data, so it might be useless.

fidotronOP7mo ago

I think that possibility is worse, because it implies a fundamental limit as opposed to a self imposed restriction, and I choose to remain optimistic.

If OpenAI really are hitting the wall on being able to scale up overall then the AI bubble will burst sooner than many are expecting.

pillefitz7mo ago

LLMs alone might be powerful enough already, they just need to be hooked up to classic AI systems to enable symbolic reasoning, episodic memory etc.

ACCount367mo ago

That's a lie people repeat because they want it to be true.

It's easy to make "model collapse" happen in lab conditions - but in real world circumstances, it fails to materialize.

noosphr7mo ago

The corollary to the bitter lesson strikes again: any hand crafted system will out perform any general system for the same budget by a wide margin.

fidotronOP7mo ago

That is, at best, wishful thinking.

In practice the whole point is the opposite is the case, which is why this direction by OpenAI is a suspicious indicator.

lacoolj7mo ago

Many tiny, specialized models is the way to go, and if that's what they're doing then it's a good thing.

fidotronOP7mo ago

Not at all, you will simply rediscover the bitter lesson [1] from your new composition of models.

[1] https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

bigmadshoe7mo ago

1 more reply

legulere7mo ago

> The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation

Is it though? To me it seems like performance gains are slowing down and additional computation in AI comes mostly from insane amounts of money thrown at it.

noosphr7mo ago

chaos_emergent7mo ago

Au contraire, ANNs are precisely the decomposition of larger problems into smaller ones.

gekoxyz7mo ago

We already did this for Object/Face recognition, it works but it's not the way to go. It's the way to go only if you don't have enough compute power (and data, I suspect) for a E2E network

sixo7mo ago

TheOtherHobbes7mo ago

It's a concept of a unified system.

bjornsing7mo ago

dang7mo ago

Related ongoing thread:

GPT-5 System Card [pdf] - https://news.ycombinator.com/item?id=44827046

illiac7867mo ago

I do agree that the current evolution is moving further and further away from AGI, and more toward a spectrum of niche/specialisation.

It feels less and less likely AGI is even possible with the data we have available. The one unknown is if we manage to get usable quantum computers, what that will do to AI, I am curious.

FeepingCreature7mo ago

If(f) it's trained end to end, it's a unified system.

mafro7mo ago

This is a precursor to a future model which isn't simply a router.

From the system card:

"In the near future, we plan to integrate these capabilities into a single model."

Icathian7mo ago

Anyone who still takes predictive statements from leadership at AI companies as anything other than meaningless noise isn't even trying.

kgwgk7mo ago

You don't get it. They couldn't do it yet because it would be too powerful and kill us all!

j / k navigate · click thread line to collapse