Imagine you are predicting the next token, you have two tokens very close in probability in the distribution, kernel execution is not deterministic because of floating point non-associativity - the token that gets predicted impacts the tokens later in the prediction stream - so it's very consequential which one gets picked.
This isn't some hypothetical - it happens all the time with LLM's - it isn't some freak accident that isn't probable