We're also seeing lots of optimizations with new models (RoPE/RoPER embedding, Swish/GeLU activation, Flash Attention, etc) but I think some the most interesting gains we'll be seeing soon is with inference-optimized training (-70% parameters for +100% compute) [1] combined with sparsity pruning (-50% size w/ almost no loss in accuracy) [2] and quantization [3] which will lead to significantly smaller models performing well.
[1] https://www.harmdevries.com/post/model-size-vs-compute-overh...
they're also not going to find another 2, 4, 8, 16 ... internets worth of content to parasitise
Not all types of AI need external training data, you can train on how effectively a goal is achieved
how can the rate be maintained?
exponential chip scaling is over, and they've parasited, sorry, trained on the entirety of accessible human knowledge
the rate may drop to zero
the exponent may even go negative once LLMs start ingesting their own hallucinations
I see this as a new development in language, used to be restricted to meat neural nets and books, now it can also be consumed and created by LLMs. A new self replication path was opened for language. Language is an evolutionary system, it's alive. Without Language humans are mere shadows of what they can be. Language turns a baby into a modern adult, and a randomly initialised neural net into chatGPT.
The magic was always in the language, not in the neural network. We should care more about the size and quality of the training dataset than the model. Any model would do, all model tweaks are more or less the same. But the data, that is the origin of all the abilities. But we cannot own abilities, it should be fair game to learn abilities and facts even from copyrighted data. Novel and creative training examples should not be reproduced by LLMs, but mere facts and skills should be general enough not to be owned by anyone.
Chip scaling still seems to be going pretty fast, and we may discover new ways to make better use of the chips we currently have, like better methods of quantisation, or just using more of them, which could get us just far enough to reach the self improvement threshold
So we could end up hitting a wall with chip scaling or something but I don’t think it’s that likely