- Provide an existing image
- Provide a text prompt ("flamingo")
- Select from X variations the new image that looks best to you
- It does the equivalent of a google image search on your "flamingo" prompt
- It picks the most blend-able ones as a basis to a new synthetic flamingo
- It superimposes the result on your image
Very cool don't get me wrong. Now I want to tweak this new floating flamingo I picked further, or have that Corgi in the museum maybe sink into the little couch a bit as it has weight in the real world.Can't. You'd have to start over with the prompt or use this as the new base image maybe.
The example with furniture placement in an empty room is also very interesting. You could describe the kind of couch you want and where you want it and it will throw you decent options.
But say I want the purple one in the middle of the room that it gave me as an option, but rotated a little bit. It would generate a completely new purple couch. Maybe it will even look pretty similar but not exactly the same.
See what I mean?
If you pay attention to all the corgi examples, the sofa texture changes in each of them, and it synthesizes shadows in the right orientation - that's what it's trained to do. The first one actually does give you the impression of weight. And if you look at "A bowl of soup that looks like a monster knitted out of wool" the bowl is clearly weighing down. I bet if the picture had a more fluffy sofa you would indeed see the corgi making an indent on it, as it will have learned that from its training set.
Of course there will be limits to how much you can edit, but then nothing stops you from pulling that into Photoshop for extra fine adjustments of your own. This is far from a 'cool trick' and many of those images would take hours for a human to reproduce, especially with complex textures like the Teddy Bear ones. And note how they also have consistent specular reflections in all the glass materials.
My issue is that it appears to not be possible to explain what the AI is doing at all. If you could, you'd be able to actually control the output. And talking about how the model is trained is interesting but not an answer.
Of course there is a superimposing step, that just means it adds its layer on top of the photo you provide. That's all it means and that's literally what it is doing, that's all I tried to say, heh.
> If you pay attention to all the corgi examples, the sofa texture changes in each of them
Yes, exactly!
> This is far from a 'cool trick' and many of those images would take hours for a human to reproduce
OK, fair enough. I'll try to be more clear:
It is very cool and not a trick and the results are fantastic if you got out exactly what you wanted. Amazing time saver. And if not? Right now this is totally hit or miss.
It would also take hours for a human to reproduce a Vermeer and this no doubt has those in its training set and would style-transfer unto a corgi instantly. Certainly faster than Vermeer himself could do it.
But Vermeer could explain how he came up with the style, his techniques, choices, 'etc.
It reads like the advance here is that it will usually synthesize something that looks great but not always the thing that you want. With no recourse.