TL;DR: The same (very small) executables gave different results when run on Intel and AMD processors because the rsqrt and rcp instructions produced slightly different outputs on the two systems.
[1]: https://github.com/jeff-arnold/math_routines/blob/main/rsqrt....