Well you're right, the M1's unified memory is not technically that much different, but it's a start. And I don't like the mix of components either. They seem to be copying previous trends, like when FPUs were integrated on-chip. Eventually we'll have some kind of standardized SIMD unit like with Intel's integrated GPUs.
But I don't want all that. I just want a flat 2D array of the same core, each with its own local memory. Then just run OpenGL or Vulkan or Metal or TensorFlow or whatever the new hotness is in software. All Turing-complete computation is inherently the same, so I feel that working in DSLs is generally a waste of time.
Arm is a relatively simple core so scaling an M1 to over say 64 cores is probably straightforward, at least on the hardware side. People complain that chips like that are hard to program, but it's only because we're stuck in C-style languages. GNU Octave or MATLAB or any vector language is trivial to parallelize. Functional languages like Julia would also have no trouble with them.
Once we aren't compute-bound, a whole host of computer science problems become tractable. But we can't get there with current technology. At least not without a lot of pain and suffering. What we're going through now isn't normal, and reminds me a lot of the crisis that desktop software reached in the mid 90s with languages like Java just before web development went mainstream.