For this though you need to go beyond NTP into PTP which is still usually based on GPS time and atomic clocks
[0] https://www.septentrio.com/en/learn-more/insights/how-gps-br...
It might be difficult to generate enough resolution in measurable events that we can predict accurately enough? Like, I'm guessing the start of a transit or alignment event? Maybe something like predicting the time at which a laser pulse will be returnable from a lunar reflector -- if we can do the prediction accurately enough then we can re-establish time back to the current fixed scale.
I think I'm addressing an event that won't ever happen (all precise and accurate time sources are lost/perturbed), and if it does it won't be important to re-sync in this way. But you know...
https://static.googleusercontent.com/media/research.google.c...
https://static.googleusercontent.com/media/research.google.c...
The ultimate goal is usually to have a bunch of computers all around the world run synchronised to one clock, within some very small error bound. This enables fancy things like [0].
Usually, this is achieved by having some master clock(s) for each datacenter, which distribute time to other servers using something like NTP or PTP. These clocks, like any other clock, need two things to be useful: an oscillator, to provide ticks, and something by which to set the clock.
In standard off-the-shelf hardware, like the Intel E810 network card, you'll have an OXCO, like [1], with a GPS module. The OXCO provides the ticks, the GPS module provides a timestamp to set the clock with and a pulse for when to set it.
As long as you have GPS reception, even this hardware is extremely accurate. The GPS module provides a new timestamp, potentially accurate to within single-digit nanoseconds ([2] datasheet), every second. These timestamps can be used to adjust the oscillator and/or how its ticks are interpreted, such that you maintain accuracy between the timestamps from GPS.
The problem comes when you lose GPS. Once this happens, you become dependent on the accuracy of the oscillator. An OXCO like [1] can hold to within 1µs accuracy over 4 hours without any corrections but if you need better than that (either more time below 1µs, or more accurate than 1µs over the same time), you need a better oscillator.
The best oscillators are atomic oscillators. [2] for example can maintain better than 200ns accuracy over 24h.
So for a datacenter application, I think the main reason for an atomic clock is simply for retaining extreme accuracy in the event of an outage. For quite reasonable accuracy, a more affordable OXCO works perfectly well.
[0]: https://docs.cloud.google.com/spanner/docs/true-time-externa...
[1]: https://www.microchip.com/en-us/product/OX-221
[2]: https://www.u-blox.com/en/product/zed-f9t-module
[3]: https://www.microchip.com/en-us/products/clock-and-timing/co...
As you say, the goal is to keep the system clocks on the server fleet tightly aligned, to enable things like TrueTime. But also to have sufficient redundancy and long enough holdover in the absence of GNSS (usually due to hardware or firmware failure on the GNSS receivers) that the likelihood of violating the SLA on global time uncertainty is vanishingly small.
The "global" part is what pushes towards having higher end frequency standards, they want to be able to freewheel for O(days) while maintaining low global uncertainty. Drifting a little from external timescales in that scenario is fine, as long as all their machines drift together as an ensemble.
The deployment I know of was originally rubidium frequency standards disciplined by GNSS, but later that got upgraded to cesium standards to increase accuracy and holdover performance. Likely using an "industrial grade" cesium standard that's fairly readily available, very good but not in the same league as the stuff NIST operates.
I mean, fleets come in all sizes; but if you put one atomic reference in each AZ of each datacenter, there's a fleet. Maybe the references aren't great at distributing time, so you add a few NTP distributors per datacenter too and your fleet is a little bigger. Google's got 42 regions in GCP, so they've got a case for hundreds of machines for time (plus they've invested in spanner which has some pretty strict needs); other clouds are likely similar.