undefined | Better HN

0 pointsWithinReason1y ago0 comments

Diffusion works significantly better for images than sequential pixel generation, there is a good chance it would work better for language as well.

Sequential generation used to be state of the art in 2016 and it's basically how current LLMs work:

https://arxiv.org/abs/1601.06759

0 comments

kleiba1y ago

Neural LMs used to be based on recurrent architectures until the Transformer came along. That architecture is not recursive.

I am not sure that a diffusion approach is all that suitable for generating language. Word are much more discrete than pixels.

WithinReasonOP1y ago

I meant sequential generation, I didn't mean using an RNN.

Diffusion doesn't work on pixels directly either, it works on a latent representation.

kleiba1y ago

All NNs work on latent representations.

1 more reply

famouswaffles1y ago

The most popular method using autoregression in image generation space is to predict image patches/tokens and not pixels, though that still scales worse than diffusion.

A fairly new but promising approach for autoregression that seems to scale as well as diffusion is predicting the next image scale/resolution rather than the next image patch.

https://arxiv.org/abs/2404.02905

magicalhippo1y ago

I had similar thoughts to you.

However diffusion models suck at details, like how many fingers on a hand, and with language words and characters matter, both which ones and where they are.

So while I'm sure diffusion could produce walls of text that look convincingly like a blog post at a glance say, I'm not sure it would hold up to anyone actually reading.

j / k navigate · click thread line to collapse

0 pointsWithinReason1y ago0 comments

Diffusion works significantly better for images than sequential pixel generation, there is a good chance it would work better for language as well.

Sequential generation used to be state of the art in 2016 and it's basically how current LLMs work:

https://arxiv.org/abs/1601.06759

0 comments

kleiba1y ago

Neural LMs used to be based on recurrent architectures until the Transformer came along. That architecture is not recursive.

I am not sure that a diffusion approach is all that suitable for generating language. Word are much more discrete than pixels.

WithinReasonOP1y ago

I meant sequential generation, I didn't mean using an RNN.

Diffusion doesn't work on pixels directly either, it works on a latent representation.

kleiba1y ago

All NNs work on latent representations.

1 more reply

famouswaffles1y ago

The most popular method using autoregression in image generation space is to predict image patches/tokens and not pixels, though that still scales worse than diffusion.

A fairly new but promising approach for autoregression that seems to scale as well as diffusion is predicting the next image scale/resolution rather than the next image patch.

https://arxiv.org/abs/2404.02905

magicalhippo1y ago

I had similar thoughts to you.

However diffusion models suck at details, like how many fingers on a hand, and with language words and characters matter, both which ones and where they are.

So while I'm sure diffusion could produce walls of text that look convincingly like a blog post at a glance say, I'm not sure it would hold up to anyone actually reading.

j / k navigate · click thread line to collapse