undefined | Better HN

0 pointsImnimo3y ago0 comments

This is actual image generation - the 'decoder' takes as input a latent code (representing the encoding of the text query), and synthesizes an image. It's not compositing or querying a reference library. The only time that real images enter the process is during training - after that, it's just the network weights.

0 comments

recuter3y ago

It is compositing as final step. I understand that the Kuala it is compositing may have been a previously un-existent Kuala that it synthesized from a library of previously tagged Kuala images... that's cool, but what is the difference really from just plucking one of the pre-existing Kualas into the scene?

The difference is just that it makes the compositing easier. If you don't have a pre-existing image that would match the shadows and angles you can hallucinate a new Kuala that does. Neat trick.

But I bet if I threw the poor marsupial at a basket net it would look really differently than the original clipart of it climbing some tree in a slow and relaxed manner. See what I mean?

Maybe Dall-E 2 can make it strike a new pose. The limb positions could be altered. But the facial expression?

And if the basketball background has wind blowing leaves in one direction the Kuala fur won't match, it will look like the training set fur. The puddle won't reflect it. 'etc.

This thing doesn't understand what a Kuala is like a 3-yr old. It understands the text "Kuala" is associated with that tagged collection of pixel blobs and can conjure up similar blobs unto new backgrounds - but it can't paint me a new type of Kuala that it hasn't seen before. It just looks that way.

dash23y ago

>And if the basketball background has wind blowing leaves in one direction the Kuala fur won't match, it will look like the training set fur. The puddle won't reflect it.

If you read the article, it gives examples that do exactly this. For example, adding a flamingo shows the flamingo reflected in a pool. Adding a corgi at different locations in a photo of an art gallery shows it in picture style when it's added to a picture, then in photorealistic style when it's on the ground.

recuter3y ago

Well not so much an article as really interesting hand picked examples. The paper doesn't address this as far as I can tell. My guess is that this is a weak point that will trip it up occasionally.

A lot of the time it doesn't super matter, but sometimes it does.

andybak3y ago

> It is compositing as final step.

I might be misinterpeting your use of "compositing" here (and my own technical knowledge is fairly shallow) but I don't think there's any compositing of elements generally in AI image generation. (unless Dall-E 2 changes this. I haven't read the paper yet)

recuter3y ago

https://cdn.openai.com/papers/dall-e-2.pdf

> Given an image x, we can obtain its CLIP image embedding zi and then use our decoder to “invert” zi, producing new images that we call variations of our input. .. It is also possible to combine two images for variations. To do so, we perform spherical interpolation of their CLIP embeddings zi and zj to obtain intermediate zθ = slerp(zi, zj , θ), and produce variations of zθ by passing it through the decoder.

From the limitations section:

> We find that the reconstructions mix up objects and attributes.

2 more replies

j / k navigate · click thread line to collapse

0 comments

recuter3y ago

The difference is just that it makes the compositing easier. If you don't have a pre-existing image that would match the shadows and angles you can hallucinate a new Kuala that does. Neat trick.

But I bet if I threw the poor marsupial at a basket net it would look really differently than the original clipart of it climbing some tree in a slow and relaxed manner. See what I mean?

Maybe Dall-E 2 can make it strike a new pose. The limb positions could be altered. But the facial expression?

And if the basketball background has wind blowing leaves in one direction the Kuala fur won't match, it will look like the training set fur. The puddle won't reflect it. 'etc.

dash23y ago

>And if the basketball background has wind blowing leaves in one direction the Kuala fur won't match, it will look like the training set fur. The puddle won't reflect it.

recuter3y ago

Well not so much an article as really interesting hand picked examples. The paper doesn't address this as far as I can tell. My guess is that this is a weak point that will trip it up occasionally.

A lot of the time it doesn't super matter, but sometimes it does.

andybak3y ago

> It is compositing as final step.

recuter3y ago

https://cdn.openai.com/papers/dall-e-2.pdf

From the limitations section:

> We find that the reconstructions mix up objects and attributes.

2 more replies

j / k navigate · click thread line to collapse