What are your thoughts on the way RISC V handled the compressed instructions subset?
Put it side-by-side with Thumb and it also looks pretty similar (thumb has a multiply instruction IIRC).
Put it side-by-side with short x86 instructions accounting for the outdated ones and the list is pretty similar (down to having 8 registers).
All in all, when old and new instruction sets are taking the same approach, you can be reasonably sure it's not the absolute worst choice.
If there's a criticism, it's that the two bytes on 32-bit instructions mean the total instruction range is MUCH smaller overall until you switch to 48-bit instructions which are then much bigger.
Higher-level languages rely heavily on inlining to reduce their abstraction penalty. Profiles which were taken from the Linux kernel and (checks notes...) Drystone are not representative of code from higher-level languages.
3/4 of the available prefix instruction space was consumed by the 16-bit extension. There have been a couple of proposals showing that even better density could be achieved using only 1/2 the space instead of 3/4, but they were struck down in order to maintain backwards compatibility.
Small revisions to a function that increase the number of live variables to more than the set that are covered by the C extension mean that reference to THAT VARIABLE ONLY have to use a full size instruction. There is nothing sudden or dramatic.
Note that a number of a C instructions can in fact use all 32 registers. This includes stack pointer-relative loads and stores, load immediate ({-32..+31}), load upper immediate (4096 * {-32..+31}, add immediate and add immediate word ({-32..+31}), shift left logical immediate, register to register add, and register move.
It's certainly possible that another compressed encoding might do better using fewer opcode, and I've seen the suggestions. The main thing wrong with the standard one in my opinion is that it gives too much prominence to floating point code, having been developed to optimise for SPEC including SPECFP (no, not the Linux kernel or Dhrystone ... I have no idea where you got that from).
But anyway it does well, and the opcode space used is not excessive. If anything it's TOO SMALL. Thumb2 gets marginally better code size while using 7/8ths of the opcode space for the 16 bit instructions instead of RISC-V's 3/4.
The RISC-V Compressed Spec v1.9 documented the benchmarks which were used for the optimization. RV32 was optimized with Dhrystone, Coremark, and SPEC-2006. RV64GC was optimized using SPEC-2006 and the Linux kernel.
Javascript is limited to 64-bit floats as is lua and a couple other languages.
Sure, you can optimize to 31/32-bit ints, but not always and not before the JIT warms up.
With this extension, RISC-V can be competitive with ARM Cortex-M.
On the other hand, the compressed instruction encoding is useless for general-purpose computers intended as personal computers or as servers, because it limits the achievable performance to much lower levels than for ARMv8-A or Intel/AMD.
The cost in area and power of a decoder for variable-length instructions increases faster with the number of simultaneously-decoded instructions than the cost of a decoder for fixed-length instructions.
This makes the compressed instruction encoding incompatible with high-performance RISC V CPUs.
For the lower performance required in microcontrollers, the compressed encoding is certainly needed for adequate code density.
The goals of minimum code size and of maximum execution speed are contradictory and the right compromise is different for an embedded computer and for a personal computer.
That is why ARM has different ISAs for the 2 domains and why also RISC-V designs must use different sets of extensions, depending on the application intended for them.