undefined | Better HN

0 pointsjameshart2y ago0 comments

The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output.

Having richer ways to consume that probability distribution than just ‘take the most likely thing, after adding some noise’ is more conducive to using LLMs to generate output that can be further processed - in rigorous ways. Like by running it through a compiler.

Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.

0 comments

joe_the_user2y ago

The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output.

-- This is, uh, false. If an LLM output a "probability distribution over all possible output", it would be producing a huge, a vast, vector each time. It doesn't. ChatGPT, GPT-3 etc produce a string output, that's it. You can say it's following a probability distribution of outputs from output space but just about anything the output does that.

Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.

-- Uh, you missed where I said "in-context predicted output". The Transformers architecture is where the LLM magic happens. It's what allows "X but in pig Latin" etc.

It's hard to get that these systems are neither "fancy autocomplete" nor AGI/something magic but an interest but sometimes deceptive middle ground.

jameshartOP2y ago

ChatGPT and GPT are APIs over LLMs.

The huge vector is what the neural net outputs. ‘Sampling’ is the process whereby a token is selected.

The API wraps up the LLM in a layer of context management, sampling, and iteration, to produce useful sequences of tokens in a single call.

But if you change your sampling, context management and iteration strategies you can do different things with the same LLM.

cubefox2y ago

Note that for any fine-tuned models (like GPT-4, where the foundation model has not been made accessible) the model does no longer give the "probabilities" of the next tokens, but rather their "goodness". Where the numbers say how good a token would be relative to the aims the model inferred from its fine-tuning.

brookst2y ago

Isn’t that the same thing? The non-fine-tuned models also have assumptions based on corpus and training. I don’t think there’s such a thing as a purely objective probability of the next token.

cubefox2y ago

It's very different. We don't know exactly what the model consideres good after fine-tuning (which can lead to surprising cases of misalignment), while the probability that something is the next token in the training distribution is very clear. I don't know how they measure it, but they can apparently measure the "loss" which (I think) says how close the model is to some sort of real probability.

1 more reply

joe_the_user2y ago

"no longer"??

The deep learning models (of which LLMs and GPTs are a type) have never returned probabilities. Ever. Why do people have that hallucination suddenly?

two_in_one2y ago

They do produce probabilities at the end of generator, And they do select a single token for output. With highest probability or somehow randomized.

So, end users see only one value. But with access to internals all high value variants can be considered. The easy way to do it is to select one, save the state. Look forward and roll back to saved state. Try another token. Select the best output. The smart way is to do it only at key points, where it matters the most. Selecting those points is a different task. May be another model.

1 more reply

avereveard2y ago

You can literally fire up the openai playground and ask gpt3 to give you all alternate token probability

j / k navigate · click thread line to collapse

0 comments

joe_the_user2y ago

The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output.

Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.

-- Uh, you missed where I said "in-context predicted output". The Transformers architecture is where the LLM magic happens. It's what allows "X but in pig Latin" etc.

It's hard to get that these systems are neither "fancy autocomplete" nor AGI/something magic but an interest but sometimes deceptive middle ground.

jameshartOP2y ago

ChatGPT and GPT are APIs over LLMs.

The huge vector is what the neural net outputs. ‘Sampling’ is the process whereby a token is selected.

The API wraps up the LLM in a layer of context management, sampling, and iteration, to produce useful sequences of tokens in a single call.

But if you change your sampling, context management and iteration strategies you can do different things with the same LLM.

cubefox2y ago

brookst2y ago

Isn’t that the same thing? The non-fine-tuned models also have assumptions based on corpus and training. I don’t think there’s such a thing as a purely objective probability of the next token.

cubefox2y ago

1 more reply

joe_the_user2y ago

"no longer"??

The deep learning models (of which LLMs and GPTs are a type) have never returned probabilities. Ever. Why do people have that hallucination suddenly?

two_in_one2y ago

They do produce probabilities at the end of generator, And they do select a single token for output. With highest probability or somehow randomized.

1 more reply

avereveard2y ago

You can literally fire up the openai playground and ask gpt3 to give you all alternate token probability

j / k navigate · click thread line to collapse