https://platform.openai.com/docs/guides/prompt-caching
It's fairly simple actually. Each machine stores the KV cache in blocks of 128 tokens.
That's stored in a prefix tree like structure. Probably with some sort of LRU eviction policy.
If you ask a machine to generate it does so starting from the longest matching sequence in the cache.
They route between racks using a hash of the prefix.
Therefore the system prompt, being frequently used and at the beginning of the context, will always be in the prefix cache.