We've had upgrades to hardware, mostly led by NVidia, that made it possible.
New LLMs don't even rely that much on that aforementioned older architecture, right now it's mostly about compute and the quality of data.
I remember seeing some graphs that shows that the whole "learning" phenomena that we see with neural nets is mostly about compute and quality of data, the model and optimizations just being the cherry on the cake.