More broadly compatible routines will still work on newer CPUs, they just won yield the best performance.
It still would be nice if such central routines could just be compiled to the REP-prefixed instructions and would deliver (near-)optimal performance so we could stop worrying about that particular part.