Generating transparent images using Stable Diffusion XL (opens in new tab)

(github.com)

216 pointstracyhenry2y ago13 comments

13 comments

Looking at the “woman, messy hair, high quality” image, the hair farther from her head looks wrong in much the way that iPhone portrait mode messes up hair. I wonder if this is an example of an AI training on partially AI-generated data and reproducing its artifacts.

dotancohen2y ago

I just looked at the photograph. What is wrong about it?

https://private-user-images.githubusercontent.com/161511761/...

amluto2y ago

The specific issue that bothers me is the incorrect imitation bokeh. The hair farther from the head is too blurry despite being well within what should be the focal plane.

This is inherent to what I think is Apple’s model of depth of field: Apple takes a picture that is fairly sharp everywhere and generates an ordinary RGB image plus a depth map (an estimated distance for each pixel). Then it applies some sort of blur that depends on depth.

This is a decent approximation if the scene contains opaque, pixel-sized or larger objects, so that each pixel’s content actually has a well defined depth. But hair tends to be much smaller than a pixel, and a pixel containing both hair and background can’t be correctly represented.

This was an issue in older (circa 2000?) Z-buffered rendering — if you naively render hair and then render an object behind the person based on the Z data from the hair rendering, you get very wrong-looking hair. It turns out that just having a GPU that can handle a zillion vertices doesn’t mean that rendering each hair independently gives good results!

Y-bar2y ago

I think it is how it fills in "waves" in the hair, notice how each time a bunch of hair intersects with another, at least one set of hair curves unnaturally? Valleys and peaks where the hair bends should be similar in natural hair, here they are not.

heinrich59912y ago

Working link: https://github.com/layerdiffusion/sd-forge-layerdiffusion/as....

GaggiX2y ago

Paper: https://arxiv.org/abs/2402.17113

The author Lvmin Zhang is the same person behind ControlNet.

msp262y ago

It's amazing how much they've contributed to imagegen. I started using forge recently and it's a great speedup from regular sd-webui. https://github.com/lllyasviel/stable-diffusion-webui-forge

teapourer2y ago

Also the creator of Fooocus, the open source version of Midjourney. It’s amazing how much one person can contribute to a field in such a short span of time.

vunderba2y ago

The partial alpha blending support for translucent materials is really cool (glass, plastic, etc).

I'd be curious to see how well this plays with inpainting. Apparently img2img is also on the authors todo list.

jasonjamerson2y ago

Good AI rotoscoping is welcome any time.

CuriouslyC2y ago

It's not too far off. Vid2vid is already decent at keeping character consistency when configured correctly, background/environment flickering is hard to control but since the process is currently done using img2img on successive frames that makes sense. I think we'll see new models that do temporal convolution soon that will make video -> video transformations absolutely stunning.

swyx2y ago

reactions

1 - the way the dog at the end gets a reflection off the floor is pretty nice.

2 - i wonder how this compares in terms of latency/complexity with a comfyui pipeline that just does a typical edge detection/masking layer to achieve the transparency effect. however i dont think that method would work with the glass example as shown

dannyw2y ago

Apache 2.0, the beauty of open source. Nice.

1 more reply

j / k navigate · click thread line to collapse

13 comments

amluto2y ago

dotancohen2y ago

I just looked at the photograph. What is wrong about it?

https://private-user-images.githubusercontent.com/161511761/...

amluto2y ago

The specific issue that bothers me is the incorrect imitation bokeh. The hair farther from the head is too blurry despite being well within what should be the focal plane.

Y-bar2y ago

heinrich59912y ago

Working link: https://github.com/layerdiffusion/sd-forge-layerdiffusion/as....

GaggiX2y ago

Paper: https://arxiv.org/abs/2402.17113

The author Lvmin Zhang is the same person behind ControlNet.

msp262y ago

It's amazing how much they've contributed to imagegen. I started using forge recently and it's a great speedup from regular sd-webui. https://github.com/lllyasviel/stable-diffusion-webui-forge

teapourer2y ago

Also the creator of Fooocus, the open source version of Midjourney. It’s amazing how much one person can contribute to a field in such a short span of time.

vunderba2y ago

The partial alpha blending support for translucent materials is really cool (glass, plastic, etc).

I'd be curious to see how well this plays with inpainting. Apparently img2img is also on the authors todo list.

jasonjamerson2y ago

Good AI rotoscoping is welcome any time.

CuriouslyC2y ago

swyx2y ago

reactions

1 - the way the dog at the end gets a reflection off the floor is pretty nice.

dannyw2y ago

Apache 2.0, the beauty of open source. Nice.

1 more reply

j / k navigate · click thread line to collapse