https://private-user-images.githubusercontent.com/161511761/...
This is inherent to what I think is Apple’s model of depth of field: Apple takes a picture that is fairly sharp everywhere and generates an ordinary RGB image plus a depth map (an estimated distance for each pixel). Then it applies some sort of blur that depends on depth.
This is a decent approximation if the scene contains opaque, pixel-sized or larger objects, so that each pixel’s content actually has a well defined depth. But hair tends to be much smaller than a pixel, and a pixel containing both hair and background can’t be correctly represented.
This was an issue in older (circa 2000?) Z-buffered rendering — if you naively render hair and then render an object behind the person based on the Z data from the hair rendering, you get very wrong-looking hair. It turns out that just having a GPU that can handle a zillion vertices doesn’t mean that rendering each hair independently gives good results!
The author Lvmin Zhang is the same person behind ControlNet.
I'd be curious to see how well this plays with inpainting. Apparently img2img is also on the authors todo list.
1 - the way the dog at the end gets a reflection off the floor is pretty nice.
2 - i wonder how this compares in terms of latency/complexity with a comfyui pipeline that just does a typical edge detection/masking layer to achieve the transparency effect. however i dont think that method would work with the glass example as shown