This was interesting to me, so I dug a bit further. This gives a bit more context behind why:
https://community.openai.com/t/observing-discrepancy-in-comp...Quote below:
Even with a greedy decoding strategy, small discrepancies regarding floating point operations lead to divergent generations. In simpler terms: when the top-two tokens have very similar log-probs, there’s a non-zero probability of choosing the least probable one due to the finite number of digits that you’re using for multiplying probs and storing them.
It should also be noted that, as the decoding occurs in an autoregressive way, once you have picked a different token the whole generated sequence will diverge, as this choice affects to the probability of generating every subsequent token.