I had ran a few chaotic experiments with StyleCLIP a few months ago which would work very well with smooth interpolation: https://minimaxir.com/2021/04/styleclip/
Now it seems to actually learn the topology lines of the human face [0], as 3D artists would learn them [1] when they study anatomy. It also uses quad grids and even places the edge loops and poles in similar places.
[0] https://nvlabs-fi-cdn.nvidia.com/_web/alias-free-gan/img/ali... [1] https://i.pinimg.com/originals/6b/9a/0c/6b9a0c2d108b2be75bf7...
The comparisons are illuminative: StyleGAN2's mapping of texture to specific pixel location looks very similar to poorly implemented video-game textures. Perhaps future GAN improvements could come from tricks used in non-AI graphic development.
Still has the telltale of mismatched ears and/or earrings. This seems the most reliable way to recognize them. Well, and the nondescript background.
I wonder what dataset you could even use to tell a GAN about human internals. 3D renders of a skull with various layers removed?
> In a further test we created two example cinemagraphs that mimic small-scale head movement and facial animation in FFHQ. The geometric head motion was generated as a random latent space walk along hand-picked directions from GANSpace [24] and SeFa [50]. The changes in expression were realized by applying the “global directions” method of StyleCLIP [45], using the prompts “angry face”, “laughing face”, “kissing face”, “sad face”, “singing face”, and “surprised face”. The differences between StyleGAN2 and Alias-Free GAN are again very prominent, with the former displaying jarring sticking of facial hair and skin texture, even under subtle movements
Second, Hollywood doesn't care about that problem. They will take the best application of the technique, and they don't care if they have to apply a few manual touchups on the result. As long as there is one way of using the system to do the sort of thing they showed in the sample, it won't matter to them that they can't embed a full video game into the neural network itself. They only care about the happy path of the tech.
Someone's probably already starting the company now to use this in special effects, or putting someone on research in an existing company.
I click the website. I search "model". I see two results. Oh no, that means no download link to model.
I go to the github. Maybe model download link is there. I see zero code: https://github.com/NVlabs/alias-free-gan
Zero code. Zero model.
You, and everyone like you, who are gushing with praise and hypnotized by pretty images and a nice-looking pdf, are doing damage by saying that this is correct and normal.
The thing that's useful to me, first and foremost, is a model. Code alone isn't useful.
Code, however, is the recipe to create the model. It might take 400 hours on a V100, and it might not actually result in the model being created, but it slightly helps me.
There is no code here.
Do you think that the pdf is helpful? Yeah, maybe. But I'm starting to suspect that the pdf is in fact a tech demo for nVidia, not a scientific contribution whose purpose is to be helpful to people like me.
Okay? Model first. Code second. Paper third.
Every time a tech demo like this comes out, I'd like you to check that those things exist, in that order. If it doesn't, it's not reproducible science. It's a tech demo.
I need to write something about this somewhere, because a large number of people seem to be caught in this spell. You're definitely not alone, and I'm sorry for sounding like I was singling you out. I just loaded up the comment section, saw your comment, thought "Oh, awesome!" clicked through, and went "Oh no..."
> I go to the github. Maybe model download link is there. I see zero code
Paper was released today. Chill. They said they will release the code in September (I'm guessing late September). The paper is also a pre-print. They're probably aiming for CVPR and don't want to get scooped.
> Model first. Code second. Paper third.
That's how you produce ML code and documentation but that is not how you release it. I guarantee you that they are still tuning and making the model better. They're were still updating ADA till pretty recently (last commit on the pytorch version is 4 months ago, to code).
I originally wasn't in CS, and when I first came over I wasn't in ML. We never had code. The fact that ML publishes models AND checkpoints is a godsend. I love it. Makes work so much easier and helps the community advance faster. I love this, but just chill. The paper isn't peer-reviewed. It is a pre-print. They're showing people what they've done in the last 6 months. It's part publicity stunt, part flex, part staking claim, but it is also part sharing with the community. Even without the code we learn a lot because they attached a paper to it. So chill.
It's also important that people understand that even if code is provided, it's commercially useless. From the NVAE license as an example[1]
> The Work and any derivative works thereof only may be used or intended for use non-commercially.
It's a great example of the difference between open source (which it is) and free software which it is not. So we're back to square one where it is probably best to clean-room the implementation from the paper, which is nearly useless to reproduce the model.
Their central improvement is that they limit the generation of high frequencies by ReLU through a upsample-ReLU-filter-downsample sequence.
Their theoretical section explains quite well why high frequencies can be proven mathematically to cause issues. And their practical implementation using filters to cut those off is very straightforward.
If someone tells you "The microphone recording had 50Hz noise so I used a filter to remove it", that's pretty much good enough for someone with experience in the field to replicate their results. This is the equivalent in AI. They uncovered a simple basic issue that everyone else overlooked, but once you know it, it seems obvious in retrospect.
Edit: the Debian Deep Learning Team's Machine Learning Policy explains why.
But I do appreciate the artefacts of StyleGAN2 as an artistic choice, too.
If you ask styleGAN to generate a specific image, that's possible, but you are no longer looking at how well these models generate images.