Software implementations can and do differ (even dynamically) based on the hardware though - e.g. glibc's sin(x) function, what C code will end up using (if not other languages relying on the C stdlib), uses FMA instructions on my CPU, and thus the exact same binary on the exact same OS with the exact same glibc should behave differently on a very old CPU without FMA where it should have a different implementation (as generally things using FMA cannot be exactly ported to hardware without it without a gigantic drop in performance which'd be extremely unacceptable).
Yeah, the lack of FMA in some contexts is a serious bummer. It would be great if every popular CPU platform would figure out a way to get FMA implemented, and if programming languages would figure out better ways to help programmers use it explicitly without making their code too ugly.
at this point, Intel, AMD, Arm, and RiscV all do and have for a while. The only one that is at all relevant that doesn't is Apple m-series under rosetta.