That is true. Here's a couple of negatives. First, you still need to build once for each architecture, either as different executables, or as different object files, and provide some dispatch mechanism to use the right one based on what hardware is available.
Second, if the intrinsics aren't built-in then there may be faster alternatives than using the GCC emulated version.