Arrows of Time for Large Language Models (opens in new tab)

(arxiv.org)

6 pointstianlong2y ago3 comments

3 comments

Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.

tianlongOP2y ago

There is a link with entropy creation?

j / k navigate · click thread line to collapse

3 comments

nyoncore2y ago

Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

frotaur2y ago

In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.

tianlongOP2y ago

There is a link with entropy creation?

j / k navigate · click thread line to collapse