>Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing.
Blind scaling sure (for whatever reason)* but this is the same Sutskever who believes in ASI within a decade off the back of what we have today.
* Not like anyone is telling us any details. After all, Open AI and Microsoft are still trying to create a 100B data center.
In my opinion, there's a difference between scaling not working and scaling becoming increasingly infeasible. GPT-4 is something like x100 the compute of 3 (Same with 2>3).
All the drips we've had of 5 point to ~x10 of 4. Not small but very modest in comparison.
>FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").
Ah sorry I meant 3 and 4.
>Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.
You don't have to be training models the entire time. GPT-4 was done training in August 2022 according to Open AI and wouldn't be released for another 8 months. Why? Who knows.