That's not exactly true in real hardware, or at least it wasn't until ~10 years ago. With the x87 FPU, internal precision was 80 bits, while the x86 registers were at most 64 bits. So, depending on the way the program would transfer data between the CPU and FPU your could get different results. It is very likely that different compilers and different optimization decisions could change the way these operations were implemented, so you would get slight differences between different versions of the software.
There are/were also several global FP flags that could get changed by other programs running on the same CPU/FPU that could impact the result of calculations. So, if you want 100% reproducible FP, you would have to either audit all software running on the same machine to ensure it doesn't touch those flags, or set the flags yourself for every FP calculation in your your program.