Computers have been “fast enough” for a very long time now. I recently retired a Mac not because it was too slow but because the OS is no longer getting security patches. While their CPUs haven’t gotten twice as fast for single-threaded code every couple years, cores have become more numerous and extracting performance requires writing code that distributes functionality well across increasingly larger core pools.
Mainframes are also like that - while a PDP-11 would be interrupted every time a user at a terminal pressed a key, IBM systems offloaded that to the terminals, that kept one or more screens in memory, and sent the data to another computer, a terminal controller, that would, then, and only then, disturb the all important mainframe with the mundane needs or its users.
You also have things like the IBM Cell processor from PS3 days: a PowerPC 'main' processor with 7 "Synergistic Processing Elements" that could be offloaded to. The SPEs were kinda like the current idea of 'big/small processors' a la ARM, except SPEs are way dumber and much harder to program.
Of course, specialized math, cryptographic and compression processors have been around forever. And you can even look at something like SCSI, where virtually all of the intelligence for working the drive was offloaded to the drive ccontroller.
Lots of ways the implement this idea.
The downside would be we’d have to think about binary compatibility between different platforms from different vendors. Anyway, it’d be really interesting to see what we could do.