"But why aren’t LLM inference engines deterministic? One common hypothesis is that some combination of floating-point non-associativity and concurrent execution leads to nondeterminism based on which concurrent core finishes first."
From https://thinkingmachines.ai/blog/defeating-nondeterminism-in...