undefined | Better HN

0 pointsredox992y ago0 comments

LLMs are deterministic. However to make them more "creative", the outputs of the LLM can be sampled using something called temperature that adds some randomness. You can set the temperature to 0 and it will be deterministic.

Having said that, GPT4 is not deterministic even at 0 temperature, either because of a bug in their implementation, or because of some load balancing among their alleged mixture of experts.

0 comments

1010112y ago

This was interesting to me, so I dug a bit further. This gives a bit more context behind why: https://community.openai.com/t/observing-discrepancy-in-comp...

Quote below:

Even with a greedy decoding strategy, small discrepancies regarding floating point operations lead to divergent generations. In simpler terms: when the top-two tokens have very similar log-probs, there’s a non-zero probability of choosing the least probable one due to the finite number of digits that you’re using for multiplying probs and storing them.

It should also be noted that, as the decoding occurs in an autoregressive way, once you have picked a different token the whole generated sequence will diverge, as this choice affects to the probability of generating every subsequent token.

scarmig2y ago

But why are there discrepancies in the floating point arithmetic? They have errors when approximating the reals, but floating point operations are all well-defined: even if 0.1 + 0.2 != 0.3, it's still always true that 0.1 + 0.2 == 0.1 + 0.2. I figure the issue must be something related to concurrency in a fleet of GPUs during inference, but even then it's not clear to me where the nondeterminism would creep in. Maybe different experts simultaneously work on an inference and the first to respond wins? Switching to models with different quantization depending on load?

imagainstit2y ago

Floating point math is not associative: (a + b) + c != a + (b + c)

This leads to different results from accumulating sums in different orderings. Accumulating in different ordering is common in parallel math operations.

2 more replies

WanderPanda2y ago

As OpenAI I would be so horribly uncomfortable about this that making it deterministic would be one of my top priorities. How can they sleep at night?!

b1122y ago

On a big pile of money?!

swores2y ago

If ChatGPT is a) usually used with a setting that makes it non-deterministic and b) for whatever reason, is also non-deterministic when that setting is not used... then why did you comment as if the person calling it a non-deterministic LLM was incorrect? They didn't claim all LLMs are, or must be, non-deterministic, just that it's a problem with this one that is.

moonchrome2y ago

Even 3.5 turbo API is non deterministic with 0 temperature.

kordlessagain2y ago

Ensembles be ensembling.

j / k navigate · click thread line to collapse