Rotate and popcount are very specialised instructions. The vast majority of software doesn't use them at all, or uses them so infrequently that a software implementation is fine.
You are confusing embedded applications, which have huge flexibility with RISC-V, and standard operating systems with packaged software.
For the next few years (5?) standard operating systems have to support exactly two choices:
- RV64GC
- RVA22
RVA22 includes all the bit manipulation instructions, vectors, cache management, scalar crypto, and some other stuff. You can't pick and choose -- you have to support it all.
If you are making an embedded appliance on the other hand you can pick and choose exactly what extensions you need (a huge number of combinations, as you say), specify a core with exactly those extensions, build a chip around that with the other IP blocks you need, and tell your compiler which extensions you have. You compile all your software yourself, whether bare metal, using an RTOS, or a minimal Linux such as builtroot or yocto. There is zero confusion because you know what you have and you have what you need -- no more and no less.
No one who knows what they are talking about is talking about fusing five-instruction sequences. That's a total red herring.