I just used to fire up VTune and inspect the hot loops... typically if you care about this you're only really working on hardware targeting the latest instruction sets anyway in my experience. It's only if you're working on low level libraries I would bother doing intrinsics all over the place.
For most consumer software you want to be able to fall back to some lowest-common-denominator hardware anyway otherwise people using it run into issues - same reason that Debian, Conda, etc. only go up to really old instructions sets.