I admit I'm not too familiar with the old MSVC inline assembly system, but the way GCC/Clang do it certainly allows emitting identical code to what you can get with intrinsics, although using naive constraints might hurt you. However, the main reason I personally use inline asm is not for performance, but to access instructions which are not provided as intrinsics or require special register handling. For example, I recently wrote some code that did syscalls directly (because it was patching memory so the normal syscall functions might not be accessible); I could have linked a separate .S file, but inline assembly made the output look much nicer. Or, while I haven't written this myself, it can be used to write out nops which zero-cost tracing facilities will then patch, something which separate assembly files cannot do.
As for effort from compiler authors, I'd like to hear more about why it is supposedly so hard. (MSVC is also the compiler which has to prioritize their choice of C++11-17 features to implement over the years, while the competitors have complete C++11 and 14 implementations. I guess that's somewhat ad hominem on the team, but ...)