Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
Arrows of Time for Large Language Models
(opens in new tab)
(arxiv.org)
6 points
tianlong
2y ago
3 comments
Share
Arrows of Time for Large Language Models | Better HN
3 comments
default
newest
oldest
nyoncore
2y ago
Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?
frotaur
2y ago
In the paper it is mentioned that the LLMs predicting the previous token are indeed pre-trained in this way, so it is not true that the difference is obvious.
tianlong
OP
2y ago
There is a link with entropy creation?
j
/
k
navigate · click thread line to collapse