You still need the GPU parallelism though.
Even with PCIe 5.0 and 16 lanes, you only get 64 GB/s of bandwidth. If you're trying to run a model too big for your GPU, then for every token, it has to reload the entire model. With a 70B parameter model, 8 bit quantization, you're looking at just under 1 token/sec just from having to transfer parts of the model in constantly. Making the actual computation faster won't make it any faster.
Once the "math is done" quite likely it would have paid off better than most investments for the top people to have spent a few short years working with grossly underpowered hardware until they could come up with amazing results there before scaling up. Rather than grossly overpowered hardware before there was even deep understanding of the underlying processes.
When you think about it, what we have seen from the latest ultra-high-powered "thinking" machines is truly so impressive. But if you are trying to fool somebody into believing that it's a real person it's still not "quite" there.
Maybe a good benchmark would be to take a regular PC, and without reliance on AI just pull out all the stops and put all the effort into fakery itself. No holds barred, any trick you can think of. See what the electronics is capable of this way. There are some smart engineers, this would only take a few years but looks like it would have been a lot more affordable.
Then with the same hardware if an AI alternative is not as convincing, something has got to be wrong.
It's good to find out this type of thing before you go overboard.
Regardless of speed or power, I never could have gotten an 8-bit computer to match the output of a 32-bit floating-point algorithm by using floating-point myself. Integers all the way and place the decimal where it's supposed to be when you're done.
Once it's really figured out, how do you think it would feel being the one paying the electric bills up until now?
The momentum with LLMs and allied technology may last till it keeps on improving even by a few percentage points and keeps shattering human created new benchmarks every few months
That is a good correlation when you think about how much more energy-efficient transistors are than vacuum tubes.
Vacuum tube computers were a thing for a while, but it was more out of desperation than systematic intellectual progress.
OTOH you could look at the present accomplishments like it was throwing more vacuum tubes at a problem that can not be adequately addressed that way.
What turned out to be a solid-state solution was a completely different approach from the ground up.
To the extent a more power-saving technique using the same hardware is only a matter of different software approaches, that would be something that realistically could have been accomplished before so much energy was expended.
Even though I've always thought application-specific circuits would be what really helps ML and AI a lot, and that would end up not being the exact same hardware at all.
If power is truly being wasted enough to start rearing its ugly head, somebody should be able to figure out how to fix it before it gets out-of-hand.
Ironically enough with my experience using vacuum tubes, I've felt that there were some serious losses in technology when the research momentum involved was so rapidly abandoned in favor of "solid-state everything" at any cost.
Maybe it is a good idea to abandon the energy-intensive approaches, as soon as anything completely different that's the least bit promising can barely be seen by a gifted visionary to have a glimmer of potential.
Its iteritive, there are plenty of cul-de-sacs and failures. You can't really optimise until you have something that works and its a messy process that is inefficient.
You're looking at this with hindsight.
I never thought it would be ideal if it was otherwise, so I guess so.
When I first considered neural nets from state-of-the art vendors to assist with some non-linguistic situations over 30 years ago, it wasn't quite ready for prime time and I could accept that.
I just don't have generic situations all the time which would benefit me, so it's clearly my problems that have the deficiencies ;\
What's being done now with all the resources being thrown at it is highly impressive, and gaining all the time, no doubt about it. It's nice to know there are people that can afford it.
I truly look forward to more progress, and this may be the previously unreached milestone I have been detecting that might be a big one.
Still not good enough for what I need yet so far though. And I can accept that as easily as ever.
That's why I put up my estimation that not all of those 30+ years has been spent without agonizing over something ;)