The drafts have to be evaluated either by a human or llm. Doing that for every request does not scale when you have millions of users.
>Just do the comparison on the user’s machine if the LLM provider is that cheap.
This is not possible. Users don't have the resources to run these gigantic models. LLM inference is not cheap. Open ai, Google aren't running profit on free cGPT or Bard.
>P.S. Also aren’t LLMs deterministic if you set their “temperature” to zero? Are there drafts if the temperature is zero? If not, then that’s the same as removing the randomness no?
It's not a problem of randomness. a temp of 0 doesn't reduce hallucinations. LLMs internally know when they are hallucinating/taking a wild guess. randomness influences how that guess manifests each time but the decision to guess was already made.
https://arxiv.org/abs/2304.13734