undefined | Better HN

0 pointssteve197728d ago0 comments

> Predict the next word is a terrible summary of what these machines do though, they certainly do more than that

What would that be?

0 comments

They generate text based on quite a large context, including hidden prompts we don’t see and their weights are distorted heavily by training. So I think there’s a lot more than a simple probability of word x coming next. That makes ‘predict next word’ a reductive summary IMO.

I do not personally feel it resembles thinking or reasoning though and really object to that framing because it is misleading many people.

karamanolev28d ago

> their weights are distorted heavily by training

What does that even mean? Their weights are essentially created by training. There aren't some magic golden weights that are then distorted.

grey-area27d ago

I may be using the wrong terms, my impression was:

1. Weights in the model are created by ingesting the corpus

2. Techniques like reinforcement learning, alignment etc are used to adjust those weights before model release

3. The model is used and more context injected which then affects which words it will choose, though it is still heavily biased by the corpus and training.

That could be way off base though, I'd welcome correction on that.

The point I was trying to make though was that they do more than predict next word based on just one set of data. Their weights can encode entire passages of source material in the training data (https://arxiv.org/abs/2505.12546), including books, programs. This is why they are so effective at generating code snippets.

Also text injected at the last stage during use has far less weight than most people assume (e.g. https://georggrab.net/content/opus46retrieval.html) and is not read and understood IMO.

There are a lot of inputs nowadays and a lot of stages to training. So while I don't think they are intelligent I think it is reductive to call them next token predictors or similar. Not sure what the best name for them is, but they are neither next word predictors nor intelligent agents.

1 more reply

joquarky27d ago

Alignment scrubs the underlying raw output to be socially acceptable. It's an artificial superego.

1 more reply

j / k navigate · click thread line to collapse

0 comments

grey-area28d ago

I do not personally feel it resembles thinking or reasoning though and really object to that framing because it is misleading many people.

karamanolev28d ago

> their weights are distorted heavily by training

What does that even mean? Their weights are essentially created by training. There aren't some magic golden weights that are then distorted.

grey-area27d ago

I may be using the wrong terms, my impression was:

1. Weights in the model are created by ingesting the corpus

2. Techniques like reinforcement learning, alignment etc are used to adjust those weights before model release

3. The model is used and more context injected which then affects which words it will choose, though it is still heavily biased by the corpus and training.

That could be way off base though, I'd welcome correction on that.

Also text injected at the last stage during use has far less weight than most people assume (e.g. https://georggrab.net/content/opus46retrieval.html) and is not read and understood IMO.

1 more reply

joquarky27d ago

Alignment scrubs the underlying raw output to be socially acceptable. It's an artificial superego.

1 more reply

j / k navigate · click thread line to collapse