But how will language performance evolve as the nature of the hardware our programs run on evolves? IMHO this is not an easy question to answer right now.
C compilers don’t produce 100% optimal assembly language in all cases, but typically the assumptions they make are light. The executable code they output is somewhat predictable and often close enough to hand-optimised assembly in efficiency that we ignore the difference. But this whole approach to programming was originally designed for single-threaded execution on a CPU with a chunk of RAM for storage.
What happens if we never find a way to get a single core to run much faster but processors come with ever more cores and introduce other new features for parallel execution? What happens if we evolve towards ever more distributed systems, but farming out big jobs to a set of specialised components in the cloud at a much lower level than we do today? What happens if systems start coming with other kinds of memory that have different characteristics to RAM as standard, from content-addressable memory we already have today to who-knows-what as quantum technology evolves?
If we change the rules then maybe a different style of programming will end up being more efficient. It’s true that today’s functional programming languages that control mutation and other side effects usually don’t compile down to machine code as efficiently as a well-written C program can. The heavier runtimes to manage responsibilities like garbage collection and the reliance on purely functional data structures that we don’t yet know how to convert to efficient imperative code under the hood are bottlenecks. But on the other hand, those languages can make much stronger assumptions than a lower-level language like C in other ways, and maybe those assumptions will allow compilers to safely allocate different behaviour to new hardware in ways that weren’t possible before, and maybe dividing a big job into 500 quantum foobar jobs that each run half as fast as a single-threaded foobaz job still ends up doing the job 200x faster overall.