undefined | Better HN

0 pointsnodja3mo ago0 comments

> Genuine question: How is it possible for OpenAI to NOT successfully pre-train a model?

The same way everyone else fails at it.

Change some hyper parameters to match the new hardware (more params), maybe implement the latest improvements in papers after it was validated in a smaller model run. Start training the big boy, loss looks good, 2 months and millions of dollars later loss plateaus, do the whole SFT/RL shebang, run benchmarks.

It's not much better than the previous model, very tiny improvements, oops.

0 comments

yalok3mo ago

add to it multiple iterations of having to restart pretraining from some earlier checkpoint when loss plateaus too early or starts increasing due to some bugs…

thefourthchime3mo ago

Isn't that what GPT 4.5 was?

wrsh073mo ago

That was a large model that iiuc was too expensive to serve profitably

Many people thought it was an improvement though

j / k navigate · click thread line to collapse