Alias-Free GAN (opens in new tab)

(nvlabs.github.io)

406 pointsnurpax4y ago79 comments

79 comments

The first two demo videos are interesting examples of using StyleCLIP's global directions to guide an image toward a "smiling face" as noted in that paper with smooth interpolation: https://github.com/orpatashnik/StyleCLIP

I had ran a few chaotic experiments with StyleCLIP a few months ago which would work very well with smooth interpolation: https://minimaxir.com/2021/04/styleclip/

Chilinot4y ago

That first picture of mark zuckerberg smiling is just straight up cursed. Interesting write up though.

Doxin4y ago

I audibly went "GAH!" when that scrolled into view. Impressive work.

Lichtso4y ago

The previous approaches learned screen-space-textures for different features and a feature mask to compose them.

Now it seems to actually learn the topology lines of the human face [0], as 3D artists would learn them [1] when they study anatomy. It also uses quad grids and even places the edge loops and poles in similar places.

[0] https://nvlabs-fi-cdn.nvidia.com/_web/alias-free-gan/img/ali... [1] https://i.pinimg.com/originals/6b/9a/0c/6b9a0c2d108b2be75bf7...

eru4y ago

Yes. It's interesting that imposing what are essentially 2d invariance constraints leads the network to learn what we regard as 3D concepts.

pvillano4y ago

There are some interesting 2d things our eyes do for 3d. If something is on the ground, half is above the horizon and half is below. Parallax is a 2d phenomenon.

goldemerald4y ago

After styleGAN-2 came out, I couldn't image what improvements could be made over it. This work is truly impressive.

The comparisons are illuminative: StyleGAN2's mapping of texture to specific pixel location looks very similar to poorly implemented video-game textures. Perhaps future GAN improvements could come from tricks used in non-AI graphic development.

tyingq4y ago

>I couldn't image what improvements could be made over it

Still has the telltale of mismatched ears and/or earrings. This seems the most reliable way to recognize them. Well, and the nondescript background.

sbierwagen4y ago

Teeth too. Partially covered objects in 3D space have been hard for a GAN to figure out. (See also hands)

I wonder what dataset you could even use to tell a GAN about human internals. 3D renders of a skull with various layers removed?

1 more reply

mzs4y ago

Mismatched reflections across eyes is the dead give-away for me.

isoprophlex4y ago

If ReLU-introduced high frequency components are indeed the culprit, won't using "softened" ReLU (without discontinuity in the derivative at 0) everywhere solve the problem, too?

Imnimo4y ago

I wonder if you could make the noise inputs work again by using the same process as for the latent code - generate the noise in the frequency domain, and apply the same shift and careful downsampling. If you apply the same shift to the noise as to the latent code, then maybe the whole thing will still be equivariant? In other words, it seems like the problem with the per-pixel noise inputs is that they stay stationary while the latent is shifted, so just shift them also!

evo4y ago

I wonder if there are learnings from this that could be transposed into the 1-D domain for audio; as far as I know, aliasing is a frequent challenge when using deep learning methods for audio (e.g. simulating non-linear circuits for guitar amps).

fogof4y ago

You can see what they're saying about the fixed in place features with the beards in the first video, but StyleGAN gets the teeth symmetry right whereas this work seems to have trouble with it. Why don't the teeth in the StyleGAN slide around like the beard does?

minimaxir4y ago

That's likely the GANSpace/SeFa part of the manipulation.

> In a further test we created two example cinemagraphs that mimic small-scale head movement and facial animation in FFHQ. The geometric head motion was generated as a random latent space walk along hand-picked directions from GANSpace [24] and SeFa [50]. The changes in expression were realized by applying the “global directions” method of StyleCLIP [45], using the prompts “angry face”, “laughing face”, “kissing face”, “sad face”, “singing face”, and “surprised face”. The differences between StyleGAN2 and Alias-Free GAN are again very prominent, with the former displaying jarring sticking of facial hair and skin texture, even under subtle movements

Geee4y ago

In video 9 teeth are sliding.

jerf4y ago

That's starting to be high enough quality that you could start considering using that for some Hollywood-grade special effects. That beach morph stuff is pretty impressive. Faces, perhaps not quite there yet because we are so hyper-focused on those biologically, but you could make one heck of a drug trip scene or a Doctor Strange-esque scene with much less effort with some of those techniques, effort perhaps even getting down to the range of Youtuber videos in the near enough future.

eru4y ago

Compare https://news.ycombinator.com/item?id=27559106

jerf4y ago

First, that's not the same technique and it's not being used for the same purpose.

Second, Hollywood doesn't care about that problem. They will take the best application of the technique, and they don't care if they have to apply a few manual touchups on the result. As long as there is one way of using the system to do the sort of thing they showed in the sample, it won't matter to them that they can't embed a full video game into the neural network itself. They only care about the happy path of the tech.

Someone's probably already starting the company now to use this in special effects, or putting someone on research in an existing company.

1 more reply

ansk4y ago

This group of researchers consistently demonstrates a degree of empirical rigor that is unmatched across any other ML lab in industry or academia - remarkable empirical results as always, reproducible experiments, open-source and well-engineered codebase, and valuable insights about low-level learning dynamics and high-level emergent artifacts. Applied ML wouldn't have such a bad rap if more researchers held themselves to similar standards.

sillysaurusx4y ago

This isn't true. I do ML every day. You are mistaken.

I click the website. I search "model". I see two results. Oh no, that means no download link to model.

I go to the github. Maybe model download link is there. I see zero code: https://github.com/NVlabs/alias-free-gan

Zero code. Zero model.

You, and everyone like you, who are gushing with praise and hypnotized by pretty images and a nice-looking pdf, are doing damage by saying that this is correct and normal.

The thing that's useful to me, first and foremost, is a model. Code alone isn't useful.

Code, however, is the recipe to create the model. It might take 400 hours on a V100, and it might not actually result in the model being created, but it slightly helps me.

There is no code here.

Do you think that the pdf is helpful? Yeah, maybe. But I'm starting to suspect that the pdf is in fact a tech demo for nVidia, not a scientific contribution whose purpose is to be helpful to people like me.

Okay? Model first. Code second. Paper third.

Every time a tech demo like this comes out, I'd like you to check that those things exist, in that order. If it doesn't, it's not reproducible science. It's a tech demo.

I need to write something about this somewhere, because a large number of people seem to be caught in this spell. You're definitely not alone, and I'm sorry for sounding like I was singling you out. I just loaded up the comment section, saw your comment, thought "Oh, awesome!" clicked through, and went "Oh no..."

godelski4y ago

> I do ML every day.

> I go to the github. Maybe model download link is there. I see zero code

Paper was released today. Chill. They said they will release the code in September (I'm guessing late September). The paper is also a pre-print. They're probably aiming for CVPR and don't want to get scooped.

> Model first. Code second. Paper third.

That's how you produce ML code and documentation but that is not how you release it. I guarantee you that they are still tuning and making the model better. They're were still updating ADA till pretty recently (last commit on the pytorch version is 4 months ago, to code).

I originally wasn't in CS, and when I first came over I wasn't in ML. We never had code. The fact that ML publishes models AND checkpoints is a godsend. I love it. Makes work so much easier and helps the community advance faster. I love this, but just chill. The paper isn't peer-reviewed. It is a pre-print. They're showing people what they've done in the last 6 months. It's part publicity stunt, part flex, part staking claim, but it is also part sharing with the community. Even without the code we learn a lot because they attached a paper to it. So chill.

2 more replies

ansk4y ago

You're clearly disillusioned with the general accessibility of ML research, but I don't think your cynicism is warranted here. Take a look at their prior works[1], and I think you'll agree they go above and beyond in making their work accessible and reproducible. There is no reason to doubt the open-source release of this work will be any different. As to why the release is delayed, I'd speculate it's because they put a significant additional amount of work into releases and because releasing code in a large corporation is a bureaucratic hassle.

[1] https://nvlabs.github.io/stylegan2/versions.html

2 more replies

minimaxir4y ago

The repo says the code will be available in September: that's a reasonable timeframe for the necessary polish/legal clearance.

1 more reply

varispeed4y ago

I am not into ML, but from time to time I like to look how this is made and remember only once seeing the code and a model, which I thought was exception from the "norm". Good that more people are calling this out!

aaron-santos4y ago

Thank you for calling this out. It's critically important that people understand the difference between model, code, and paper and what they mean.

It's also important that people understand that even if code is provided, it's commercially useless. From the NVAE license as an example[1]

> The Work and any derivative works thereof only may be used or intended for use non-commercially.

It's a great example of the difference between open source (which it is) and free software which it is not. So we're back to square one where it is probably best to clean-room the implementation from the paper, which is nearly useless to reproduce the model.

[1] https://github.com/NVlabs/NVAE/blob/master/LICENSE

1 more reply

fxtentacle4y ago

You won't need that much source code to verify their claims.

Their central improvement is that they limit the generation of high frequencies by ReLU through a upsample-ReLU-filter-downsample sequence.

Their theoretical section explains quite well why high frequencies can be proven mathematically to cause issues. And their practical implementation using filters to cut those off is very straightforward.

If someone tells you "The microphone recording had 50Hz noise so I used a filter to remove it", that's pretty much good enough for someone with experience in the field to replicate their results. This is the equivalent in AI. They uncovered a simple basic issue that everyone else overlooked, but once you know it, it seems obvious in retrospect.

pabs34y ago

Would you not want the data and code used to train the model, rather than the trained model itself?

Edit: the Debian Deep Learning Team's Machine Learning Policy explains why.

https://salsa.debian.org/deeplearning-team/ml-policy

1 more reply

clircle4y ago

I hardly understand this comment. Statisticians have been publishing and arguing about models for 100 years now. No one required code to verify authenticity of research. I suppose it is the sorry state of machine learning research that the methodology is so poor that a person cannot verify the research from the paper.

benrbray4y ago

Interesting to that this method makes use of Equivariant Neural Networks. Taco Cohen recently published his PhD thesis [1], which combines a dozen or so papers he authored on the topic.

[1]: https://pure.uva.nl/ws/files/60770359/Thesis.pdf

datameta4y ago

Wow! The rate of progress is truly stunning. I wonder what Refik Anadol could create with this technique.

eru4y ago

Their examples look much better in some objective sense. Especially if you want to create something that looks realistic in animation.

But I do appreciate the artefacts of StyleGAN2 as an artistic choice, too.

Bjartr4y ago

That beach interpolation is begging for a music video

ipunchghosts4y ago

How does this differ from Richard zhang's work?

ChuckNorris894y ago

I expect this work will feed back into removing the aliasing artifacts you sometimes get when using DLSS in games.

l_d_s4y ago

The internal representations (Video 8) look suspiciously like The Lawnmower Man ...

forgotpwd164y ago

Why weren't the same pictures used for StyleGAN2 and Alias-Free GAN?

dannyw4y ago

Because the latent space to picture pipeline is considerably different. There's no weight to output compatibility.

If you ask styleGAN to generate a specific image, that's possible, but you are no longer looking at how well these models generate images.

russdpale4y ago

Great work!

Gimpei4y ago

Those are some creepy pictures! It's like a photo of the demon inside.

j / k navigate · click thread line to collapse

79 comments

minimaxir4y ago

I had ran a few chaotic experiments with StyleCLIP a few months ago which would work very well with smooth interpolation: https://minimaxir.com/2021/04/styleclip/

Chilinot4y ago

That first picture of mark zuckerberg smiling is just straight up cursed. Interesting write up though.

Doxin4y ago

I audibly went "GAH!" when that scrolled into view. Impressive work.

Lichtso4y ago

The previous approaches learned screen-space-textures for different features and a feature mask to compose them.

[0] https://nvlabs-fi-cdn.nvidia.com/_web/alias-free-gan/img/ali... [1] https://i.pinimg.com/originals/6b/9a/0c/6b9a0c2d108b2be75bf7...

eru4y ago

Yes. It's interesting that imposing what are essentially 2d invariance constraints leads the network to learn what we regard as 3D concepts.

pvillano4y ago

There are some interesting 2d things our eyes do for 3d. If something is on the ground, half is above the horizon and half is below. Parallax is a 2d phenomenon.

goldemerald4y ago

After styleGAN-2 came out, I couldn't image what improvements could be made over it. This work is truly impressive.

tyingq4y ago

>I couldn't image what improvements could be made over it

Still has the telltale of mismatched ears and/or earrings. This seems the most reliable way to recognize them. Well, and the nondescript background.

sbierwagen4y ago

Teeth too. Partially covered objects in 3D space have been hard for a GAN to figure out. (See also hands)

I wonder what dataset you could even use to tell a GAN about human internals. 3D renders of a skull with various layers removed?

1 more reply

mzs4y ago

Mismatched reflections across eyes is the dead give-away for me.

isoprophlex4y ago

If ReLU-introduced high frequency components are indeed the culprit, won't using "softened" ReLU (without discontinuity in the derivative at 0) everywhere solve the problem, too?

Imnimo4y ago

evo4y ago

fogof4y ago

minimaxir4y ago

That's likely the GANSpace/SeFa part of the manipulation.

Geee4y ago

In video 9 teeth are sliding.

jerf4y ago

eru4y ago

Compare https://news.ycombinator.com/item?id=27559106

jerf4y ago

First, that's not the same technique and it's not being used for the same purpose.

Someone's probably already starting the company now to use this in special effects, or putting someone on research in an existing company.

1 more reply

ansk4y ago

sillysaurusx4y ago

This isn't true. I do ML every day. You are mistaken.

I click the website. I search "model". I see two results. Oh no, that means no download link to model.

I go to the github. Maybe model download link is there. I see zero code: https://github.com/NVlabs/alias-free-gan

Zero code. Zero model.

You, and everyone like you, who are gushing with praise and hypnotized by pretty images and a nice-looking pdf, are doing damage by saying that this is correct and normal.

The thing that's useful to me, first and foremost, is a model. Code alone isn't useful.

Code, however, is the recipe to create the model. It might take 400 hours on a V100, and it might not actually result in the model being created, but it slightly helps me.

There is no code here.

Okay? Model first. Code second. Paper third.

Every time a tech demo like this comes out, I'd like you to check that those things exist, in that order. If it doesn't, it's not reproducible science. It's a tech demo.

godelski4y ago

> I do ML every day.

> I go to the github. Maybe model download link is there. I see zero code

> Model first. Code second. Paper third.

2 more replies

ansk4y ago

[1] https://nvlabs.github.io/stylegan2/versions.html

2 more replies

minimaxir4y ago

The repo says the code will be available in September: that's a reasonable timeframe for the necessary polish/legal clearance.

1 more reply

varispeed4y ago

aaron-santos4y ago

Thank you for calling this out. It's critically important that people understand the difference between model, code, and paper and what they mean.

It's also important that people understand that even if code is provided, it's commercially useless. From the NVAE license as an example[1]

> The Work and any derivative works thereof only may be used or intended for use non-commercially.

[1] https://github.com/NVlabs/NVAE/blob/master/LICENSE

1 more reply

fxtentacle4y ago

You won't need that much source code to verify their claims.

Their central improvement is that they limit the generation of high frequencies by ReLU through a upsample-ReLU-filter-downsample sequence.

pabs34y ago

Would you not want the data and code used to train the model, rather than the trained model itself?

Edit: the Debian Deep Learning Team's Machine Learning Policy explains why.

https://salsa.debian.org/deeplearning-team/ml-policy

1 more reply

clircle4y ago

benrbray4y ago

Interesting to that this method makes use of Equivariant Neural Networks. Taco Cohen recently published his PhD thesis [1], which combines a dozen or so papers he authored on the topic.

[1]: https://pure.uva.nl/ws/files/60770359/Thesis.pdf

datameta4y ago

Wow! The rate of progress is truly stunning. I wonder what Refik Anadol could create with this technique.

eru4y ago

Their examples look much better in some objective sense. Especially if you want to create something that looks realistic in animation.

But I do appreciate the artefacts of StyleGAN2 as an artistic choice, too.

Bjartr4y ago

That beach interpolation is begging for a music video

ipunchghosts4y ago

How does this differ from Richard zhang's work?

ChuckNorris894y ago

I expect this work will feed back into removing the aliasing artifacts you sometimes get when using DLSS in games.

l_d_s4y ago

The internal representations (Video 8) look suspiciously like The Lawnmower Man ...

forgotpwd164y ago

Why weren't the same pictures used for StyleGAN2 and Alias-Free GAN?

dannyw4y ago

Because the latent space to picture pipeline is considerably different. There's no weight to output compatibility.

If you ask styleGAN to generate a specific image, that's possible, but you are no longer looking at how well these models generate images.

russdpale4y ago

Great work!

Gimpei4y ago

Those are some creepy pictures! It's like a photo of the demon inside.

j / k navigate · click thread line to collapse