undefined | Better HN

0 pointscubefox1y ago0 comments

A popular theory in neuroscience is that this is what the brain does:

https://slatestarcodex.com/2017/09/05/book-review-surfing-un...

It's called predictive coding. By trying to predict sensory stimuli, the brain creates a simplified model of the world, including common sense physics. Yann LeCun says that this is a major key to AGI. Another one is effective planning.

But while current predictive models (autoregressive LLMs) work well on text, they don't work well on video data, because of the large outcome space. In an LLM, text prediction boils down to a probability distribution over a few thousand possible next tokens, while there are several orders of magnitude more possible "next frames" in a video. Diffusion models work better on video data, but they are not inherently predictive like causal LLMs. Apparently this new Doom model made some progress on that front though.

0 comments

ccozan1y ago

Howver, this is due how we actually digitize video. From a human point a view, looking in my room reduces the load to the _objects_ in the room and everyhing else is just noise ( like the color of the wall could be just a single item to remember, while otherwise in the digital world, it needs to remember all the pixels )

j / k navigate · click thread line to collapse

0 pointscubefox1y ago0 comments

A popular theory in neuroscience is that this is what the brain does:

https://slatestarcodex.com/2017/09/05/book-review-surfing-un...

0 comments

ccozan1y ago

j / k navigate · click thread line to collapse