I’m sorry … but what?! As someone who worked on early
LLM tech at Google hundreds of millions were poured into it. Do you think that paper just spontaneously came from pure theory?
Most of the core idea of transformers was invented in the early 1990s, including what at the time were termed Fast Weight Programmers and are formally equivalent to linearised self-attention: https://people.idsia.ch/~juergen/fast-weight-programmer-1991... . Google just had the hardware to actually run and experiment with large transformers.
It would be very funny if at some point someone stood up to Schmidhuber and told him they already created something he did, but yet another 30 years earlier.