Your acting like the read/write time to the hardware/FPGA is negligible which it isn't.
When you access hardware on say a PCI bus (which you would in this scenario). Your call to the PCI bus does not take place WHEN you call for it to take place. You call the Kernel, which calls the scheduler, which calls the hardware manager, which calls the driver, which finally processes your request.
Now your request is processed all this is dumped and something else runs while the processor waits to hear back from the PCI bus with the response because this takes ages in processor time.
Finally an interrupt arrives, is made sense of, the appropriate driver is called, then it gives your information back to your process and you back on your merry little way.
:.:.:
The problem is when you called OS to start this long chain of events the real world didn't stop. Your real time module is still counting the 32,600,000 pulses per second.
This is where you'll get errors. Because often its easy to think things in a computer happen instantly, or so blindingly fast you don't care what happens, order or speed.
The situation you described is what originally gave me 0.1% error. Eventually I switched to a more aggressive tact of polling asynchronously in a separate thread and when a process called for the time responding with the latest received time.
This got me down to 0.07% error. Still not acceptable.
:.:.:
Its nice to be starry eyed and think there is no reason to give up security. But sometimes secure software can't do what you need it to do. The more crap you put between you and the metal the more time it'll take to execute.
This is logically provable.
If you take process A and B. Both are the optimal way to do something. There is no faster way to do this task. Therefore one can assume A and B's execution time are equal.
Yet B operates in a secure sand-boxed environment with a time sharing OS. Therefore B's true execution is B+C+D.
We know A = B, but for A = (B+C+D) C and D must be Zero, which they can never be in the real world.