GCC attributes give you per-function control over both target and optimization options (unfortunately not with Fortran). I'd have to look up what's available, but some per-loop control is possible with OpenMP pragmas too (perhaps with GCC's -fopenmp-simd if you don't want the threading).