undefined | Better HN

0 pointshatefulmoron1y ago0 comments

> GPT-4 was released 16+ months ago. In that time OpenAI made a cheaper model (which it teased extensively and the media was sure was GPT-5) and its competitors caught up but have not yet exceeded them. OpenAI's now saying that GPT-5 is in progress, but we don't know what it looks like yet and they're not making any promises.

I don't really know anything about business, but something else I've wondered is this: if LLM scaling/progress really is exponential, and the juice is worth the squeeze, why is OpenAI investing significantly in everything that's not GPT-5? Wouldn't exponential growth imply that the opportunity cost of investing in something like Sora makes little sense?

0 comments

lolinder1y ago

Yes, exactly. An observation of OpenAI's behavior gives many clues that suggest they know we've hit the plateau and have known for some time.

A huge one for me is that Altman cries "safety" while pushing out everyone who actually cares about safety. Why? He desperately wants governments to build them a moat, yesterday if possible. He's not worried about the risks of AGI, he's afraid his company won't get there first because they're not making progress any more. They're rushing to productize what they have because they lost their only competitive advantage (model quality) and don't see a path towards getting it back.

pzo1y ago

I think in one interview Ilya or Sam explained that there are limited text data in the world and this is probably one of the bottleneck. But they mentioned there is still a lot of data in other modalities such as audio, video. Probably the reason more focus on multimodal models and also synthetic datasets.

I also don't thing the only way to improve LLM is by improving as zero shot inference. Did wrote any code in zero shot style that compiled and worked? It's a multistep process and probably agents and planning will be a next step for LLM.

Cheap inference help a lot in this case since you can give a task during the night to AI what you wanna do. Go to sleep then in the morning review the results. In this way AI is bruteforcing the solution by trying many different paths but that's kind of e.g. most programming works. You try many things until you don't have errors, code compiles and passes the tests.

hatefulmoronOP1y ago

> But they mentioned there is still a lot of data in other modalities such as audio, video. Probably the reason more focus on multimodal models and also synthetic datasets.

I think this is really interesting, but I wonder if there really is enough data there to make a qualitative difference. I'm sure there's enough to make a better model, but I'm hesitant to think it would be better than an improved chatbot. What people are really waiting for is a qualitative shift, not a just an improved GPT.

> It's a multistep process and probably agents and planning will be a next step for LLM.

I agree, we definitely need a new understanding here. Right now, with the architecture we have, agents just don't seem to work. In my experience, if the LLM doesn't figure it out with a few shots, trying over and over again with different tools/functions doesn't help.

kmeisthax1y ago

Because they're running out of training data. OpenAI doesn't want to scrape new text off the Internet because they don't know what is and isn't AI-generated. Training off AI data tends to homogenize the resulting output, as certain patterns overexpressed by one AI get overexpressed by others.

If they start scraping, training, and generating images and video, then they have lots more data to work with.

oblio1y ago

> Because they're running out of training data.

Now that would be super funny.

Civilization VII tech tree

AI singularity tech

Prerequisites: in order to research this, your world needs to have at least 100 billion college educated inhabitants.

:-)))

XMPPwocky1y ago

For multimodal input, okay, I can see the argument. But since, as you said, training on generated data can be almost worse than useless, what's the point of generating it?

If I had these concerns as OpenAI, I'd be pushing hard to regulate and restrict generative image/video models, to push the end of the "low background data" era as far into the future as possible. i feel like the last thing I'd be doing is productizing those models myself!

FireBeyond1y ago

> If I had these concerns as OpenAI, I'd be pushing hard to regulate and restrict generative image/video models

They are! And I'm guessing maybe their perspective is if they can identify their own generative content, they can make a choice to ignore it and not cannibalize.

kmeisthax1y ago

That's a brand new argument.

I don't actually think it's a bad one, but OpenAI didn't think that far ahead. They are pushing for regulation but that's mainly to screw over competing models, not to give them more data runway. Every capitalist is a temporarily embarrassed feudal aristocrat after all.

Furthermore, even if OpenAI had a perfect AI/human distinguisher oracle and could train solely on human output, that wouldn't get us superhuman reasoning or generalization performance. The training process they use is to have the machine mimic the textual output of humans. How exactly do you get a superhuman AGI[0] without having text generated by a superhuman AGI to train on?

[0] Note: I'm discounting "can write text faster than a human" as AGI here. printf in a tight loop already does that better than GPT-4o.

gwern1y ago

When it comes to R&D, those are all very similar. Sora is another way to predict tokens, if you will. It's not like you have to choose that much: Sora is not using up all the compute that should've went to GPT-5. (It might not even be training in the same cluster; it might train on a separate set of GPUs which are useless for GPT-5 purposes because they are too small and the datacenter in question is tapped out.) Sora can't be GPT-5, but it could lead to a GPT-6. You have the Gato approach of tokenizing video into a stream of text tokens with RL, but you might instead wind up taking a more complex diffusion approach to video. Yet the goal is still the same: to control the world and solve things like robotics by predicting. And if you want to reach the goal as quickly as possible, you will try things in parallel, even if you know most of them will fail and you might wind up just doing the obvious thing you expected to do from the start. (After all, the whole reason we are here is that OA didn't throw 100% of its compute into big PPO runs doing DRL research, but Alec Radford was allowed to keep tinkering away with RNNs for predicting text, which led to GPT-1 and then GPT-2 etc.)

hatefulmoronOP1y ago

If I was going to nitpick I would point out that it's not necessarily the GPUs that Sora is consuming, it's the engineering effort from what could be called the top talent in AI and the vast amount of money OpenAI is borrowing that could be spent elsewhere.

> And if you want to reach the goal as quickly as possible, you will try things in parallel

This is sort of the exploration-exploitation problem, right? But I think you'd agree that a company full of people who firmly believe that GPT-(n+1) will literally be AGI, and that we're on an exponential curve, will be fully in exploitation mode. In my mind, exploring methods of generating videos is not a path towards their stated goal of AGI. Instead, it's an avenue to _earn money now_. OpenAI is in a slightly awkward position: their main product (ChatGPT) is not super useful right now, and is facing increasingly viable competition.

_w1tm1y ago

> I don't really know anything about business, but something else I've wondered is this: if LLM scaling/progress really is exponential, and the juice is worth the squeeze, why is OpenAI investing significantly in everything that's not GPT-5? Wouldn't exponential growth imply that the opportunity cost of investing in something like Sora makes little sense?

You can spend 100% on the next generation or you can spend a small percentage to productize the previous generation to unlock revenue that can be spent on the next generation.

The latter will result in more investment into the next generation.

j / k navigate · click thread line to collapse

0 comments

lolinder1y ago

Yes, exactly. An observation of OpenAI's behavior gives many clues that suggest they know we've hit the plateau and have known for some time.

pzo1y ago

hatefulmoronOP1y ago

> But they mentioned there is still a lot of data in other modalities such as audio, video. Probably the reason more focus on multimodal models and also synthetic datasets.

> It's a multistep process and probably agents and planning will be a next step for LLM.

kmeisthax1y ago

If they start scraping, training, and generating images and video, then they have lots more data to work with.

oblio1y ago

> Because they're running out of training data.

Now that would be super funny.

Civilization VII tech tree

AI singularity tech

Prerequisites: in order to research this, your world needs to have at least 100 billion college educated inhabitants.

:-)))

XMPPwocky1y ago

For multimodal input, okay, I can see the argument. But since, as you said, training on generated data can be almost worse than useless, what's the point of generating it?

FireBeyond1y ago

> If I had these concerns as OpenAI, I'd be pushing hard to regulate and restrict generative image/video models

They are! And I'm guessing maybe their perspective is if they can identify their own generative content, they can make a choice to ignore it and not cannibalize.

kmeisthax1y ago

That's a brand new argument.

[0] Note: I'm discounting "can write text faster than a human" as AGI here. printf in a tight loop already does that better than GPT-4o.

gwern1y ago

hatefulmoronOP1y ago

> And if you want to reach the goal as quickly as possible, you will try things in parallel

_w1tm1y ago

You can spend 100% on the next generation or you can spend a small percentage to productize the previous generation to unlock revenue that can be spent on the next generation.

The latter will result in more investment into the next generation.

j / k navigate · click thread line to collapse