If I recall correctly, a not-insignificant issue is (was?) that the ESP is based on the relatively obscure Xtensa microarchitecture - which is poorly supported (if at all) by the regular open-source toolchain. This means you have to use forks provided by Espressif, rather than just using the standard ones provided by your OS.
It's still open-source so a lot better than having to use a proprietary compiler or IDE, but it's a lot more involved than just your regular bundle of C libraries you can use with your normal tooling.
Yup. The original ES32, ESP32-S2, and ESP32-S3 use Xtensa, but the ESP32-C2, -C3, -C6, and -H2 use RISC-V.
Unfortunately they don't all have the same feature set, so you'll often still see the Xtensa variants in the wild as they are are simply a better product overall.