I hope you all enjoy playing with the new and improved generator! We've been hard at work improving the model quality since the last time the site was posted[1]
As both a professional fantasy illustrator & software engineer, I find the concept of AI creativity so fascinating. On one hand, I know that mathematically AI only can hallucinate images that fit within the distribution of things that it's seen. But from the artist perspective, the model's ability to blend two existing styles into something so distinctly new is so incredible (and not to mention also commercially useful!)
Anyways, happy to answer any question, thoughts, or concerns!
---
Can you talk a little about team size, work process, funding and revenue stream? I think the effort required for such an undertaking is vastly underestimated by readers.
> I think the effort required for such an undertaking is vastly underestimated by readers.
Haha for sure. Hosting a real-time ML model for people to do sub 1-second inferences at HN-load scale is definitely nontrivial.
same here. what's naive about it?
not to badmouth the undertaking, but wtf is this doing on HN?
My question is, how do you figure out how to parameterize "Same character, different pose" / "Same character, different eyes" / "Same character, different gender" / etc?
My (super limited) understanding of GANs is that they slowly discover these features over time simply from observation in the data set, and not from any labels.
So how could you make e.x. a slider for head position, style, pose, etc? How do you look at the resulting model and figure out "these are the inputs we have to fiddle with to make it use a certain pose"?
You mention it a bit in this section, but I didn't fully understand: "By isolating the vectors that control certain features, we can create results like different pose, same character"
And I assume the same step needs to be done every time the model is retrained or fine-tuned, because possibly the vectors have shifted within the model since they are not fixed by design?
You can think of it like coordinates on a many-dimensional vector grid.
We craft the functions the functions that will illuminate sets of those points based on a combination of observation, what we know about our model architecture, and how our data is arranged.
And yes, when the model is retrained, we have to discover them again!
A couple questions:
1) I didn't really understand how you went about identifying what vectors of the latent space stand for various things, like pose or color. Did you train one of the AIs to that effect, or did you manually inspect a bunch of vectors, twiddling through them one by one, did to the outcome?
2) If one were to train an AI to the same level using commodity cloud services, what's the order of magnitude cost that you would pay for the training? More like $100, $1,000, $10,000 or $100,000?
2) Depends on the quality you are seeking. If you only want one run of a similar, off-the-shelf model, around the 1000s is enough. But at the number of iterations you have to run to build your own and improve results, you probably need about 100k.
To tackle this problem, we built our own supercomputer off of parts we bought off of ebay, though I can't say I recommend that route, because it now lives in our living room.
Does this mean two weeks of development, or two weeks to generate the images we're seeing? Or maybe did you train the model for two weeks? That point just wasn't exactly clear for me.
Development took on-and-off roughly 2 years to achieve the quality you see today.
We're currently working on the data migration from V1! As long as you are using the same email as you did in 2019, you'll be able to see the image again!
As for a V2 generation, sorry, because the models are different, you'll have to discover a similar image again, if you want a V2 version!
There was such popular demand for these "horror" images that we made them part of the generation in V2! If you refresh enough on the webpage, you can find some horrors!
I've seen a number of mobile games that just get flooded with characters; this tool looks like it could be used to automate that process. It could be combined with AI-generated character profiles as well, creating an 'infinite' character roster in video games.
In humans, things like the pupil can be the give away.
https://www.newscientist.com/article/2289815-ai-can-detect-a...
Like this one by fast.ai!
Is there an email to reach out to you or someone in the team? ($HNusername @ gmail)
I think I could use this for a project.
>> It is interesting to note that from this process, the AI is not merely learning to copy the works it has seen, but forming high-level (shapes) and low-level (texture) features for constructing original pictures in its own mental representation.
Can you explain what you mean by "mental" representation? Does your system have a mind?
Also, why are you calling it "an AI"? Is it because you think it is an artificial intelligence, say like the robots in science fiction movies? Is it capable of anything else than generating images?
On each step, high-level parameters are combined with predefined weights to produce a more low-level output.
Seems, a similar transformation is going on here, except that the weights and the structure are somehow learned on its own.
https://www.gwern.net/Danbooru2020
Though now we have made our own :)
Waifu Labs v2, referenced in this post (generate amazing custom anime face images): https://waifulabs.com (write-up is the above link: https://waifulabs.com/blog/ai-creativity)
This Anime Does Not Exist (AI-generated anime-style artwork): https://thisanimedoesnotexist.ai (write-up https://www.gwern.net/Faces#extended-stylegan2-danbooru2019-... and https://nearcyan.com/this-anime-does-not-exist)
This Waifu Does Not Exist (AI-generated anime-style faces): https://thiswaifudoesnotexist.net (write-up: https://www.gwern.net/Faces#twdne)
There's also a lot of literate on e.g. automatic manga coloration, auto-translation, image superresolution, anime frame interpolation, and much more. Worth checking out some places like https://old.reddit.com/r/AnimeResearch/ if you're interested!
I think the speed that GANs have come in to the world has really shaken people up and it’s hard to process what this all means and what it will result in. Especially the ones which generate based on real people.
But the feeling this gives me, is what happens for the future of art. Sure, this example is no where even close to replacing real artists, but it’s already generating images better than I can draw after a year of practice. It does give me a feeling of “what is the point”. Which might be an irrational feeling, but I’m sure others feel the same.
Though, the conclusion I've come to is that that hand-drawn art will always meaningful for humans, because it is born of the human experience.
An interesting example is the invention of photography, which at its time, was very good at doing the thing artists were doing back then (capturing likenesses)
But photography didn't replace art: instead, artists now use photographs to be more expressive, convincing, and make better art. In tandem, the widespread adoption of photography meant that more average folks could get their likenesses taken!
Personally, my skills as an artist has improved by quite a bit, after launching this product, purely because observing it offers some fascinating insights into how anime is created!
I hope that as an industry, we'll find better ways to create, and what we know to be the "best" art today will be even better in the future!
Comparing photography to hand drawn art is silly. They are two different mediums.
Your company could be the first to capture the market. I guess if you can sleep with the consequences of your work,who cares? Im not judging because if its not you, it will be someone else.
Personally i think we as a society need to step back and press pause and really consider the consequences of this technology, and even existing technologies.
If you become rich, could you set up a charity for all the future starving artists, if that future comes to pass? I dont want to live in a world where theres no room for human creativity.
Not an artist, just a concerned human.
This is sort of the same thing on steroids. You can copy/remix previous art by feeding them into a ML model in training mode, and it will be massively utilized the same way ctrl-c ctrl-v is used, but it's a part of the toolset of art creation, not replacing it.
Then you need attributions for said previous arts, all of it, at least going by texts of laws.
There's a similar situation ongoing with fiction writing, by way of NovelAI. (And some competitors, but NovelAI is head and shoulders ahead of the pack. Thankfully; they seem to be the nicest of the lot.)
I'm a fairly prolific (fan-)fiction writer, and also AI enthusiast, so of course I jumped on that bandwagon as soon as I could. What I've found is...
- AI cannot write stories on its own. It just can't, full stop. Some people try, including me, but the results are nonsensical without significant tweaking. I expect that to change eventually, but not without a conceptual breakthrough or two.
- AI is immensely useful as a prosthetic imagination.
What I use it for isn't to write the story for me. It's to, in case I ever get stuck at some point, offer me suggestions for how the story can continue -- suggestions that I can accept or deny. Even if I deny it, it's useful as a way of illuminating my own ideas for the story. There's got to be a reason I don't like that continuation, and that is often enough to think of something I do like.
In other words, it's mostly eliminated writer's block.
It's also handy for expanding my vocabulary. English is my third language, and while I like to think I'm good enough for daily life -- I've lived in Ireland for over a decade, after all -- there's a big difference between 'good enough for daily life' and 'good enough to write good fiction'. Prior to using NovelAI, my writing was... dry. Conceptually heavy SF doesn't necessarily require high-end wordcrafting, but it helps.
The AI, especially when told to emulate Sheridan Le Fanu or any of the other great authors, is better than me at this. And since I can ask it to jump in at any point, it's become the most attentive, capable cowriter I've ever had. Perhaps noticing this, NovelAI now calls their default AI tuning 'Co-Writer'.
It's still likely to write something I can't immediately use, but that just means I need to absorb its ideas and make them my own. Repeat a hundred times per day, and I end up learning much, much faster than I ever did when I was writing on my own.
To summarize, I don't use AI to write my stories for me. I use it to get better at writing.
I think it should be possible to do the same for other forms of art.
It was not so long ago computers bet humans at chess, yet people still play.
Yes, people still play, but they no longer create.
With the exception of Adversarial attacks on particular algorithms, no human is creating new Chess theory, discovering new openings, for example.
As a game, challenge, competition, social activity, chess is alive and well.
As a creative endeavour, or vehicle for discovery, Chess is solved. It is no longer an art of its own.
We're part way through this transition now with Go as well. New opening theory, new joseki, new strategies are being played by robots, and at the highest professional levels we are playing catch-up to understand.
For art it feels a bit different since it’s not competitive and more a practicality thing. Perhaps art will shift from placing individual strokes on an image and move to making creative directions for AI to resolve in to an image or enable more people to create labor intensive works like animation.
> Given a training set, this technique learns to generate new data with the same statistics as the training set.
There isn't a creative process here nor any creative introspection going on. While the technical results are impressive, this article does not address creativity even superficially, and just slaps the label on. There isn't any AI either. It's machine learning, i.e., statistical models and algorithms.
"It cannot be creative because it's only bits/cogs/linear algebra/etc/etc." Well describe to me the way it is different to the processes of the human brain? "There is some magic sauce in the human brain we do not yet understand!", well then how do you know that this magic sauce does not exist within the statistical models inside the computer?
I find it very irritating that such shallow reasoning prevails amongst intelligent people.
For example, humans don't need to see millions of examples of waifu before they can draw their own.
Also, humans can draw in different styles, including novel styles that look nothing like styles they have seen before. Statistical models like GANs can only draw in styles similar to the ones in their training sets.
Statistical modelling can only represent the data in a training dataset and is incapable of novelty. Humans are capable of novelty.
>> Well describe to me the way it is different to the processes of the human brain?
We haven't created the human brain and it's very unlikely it uses a technology we understand, like linear algebra.
This is no less shallow reasoning. The question of whether the academic field of statistical modelling already contains the necessary ideas to produce strong AI is not decided, and won't be unless/until somebody makes a strong AI. People have different intuitions about what the answer will be and until it can be determined empirically I suggest treating them as what they are: intuitions.
Our ability to make the decision between following the rules and breaking the rules, when suitable. A computer could also break the rules, but in most cases it wouldn't make sense or look like good, while a human could make judgement about when to break the rule. Sure, we all learn by copying, but after a while, we start getting a feel for when to break the rule, and that's when unique art appears. Computers seems to not have learned this fact yet (or rather, haven't been taught that yet).
Using the tool that this submission offers, all the results will look similar and can be traced back to the training set you give it. Do something similar with a human (over similar amount of time that the machine got, in terms of human time) and eventually the results will look way different than the training set, as what we see with artists in real life.
For something like the original post, we do know what these things are. They're statistical models. Full stop. They show no indication of what we see in creative and intelligent behavior, that is the ability to self-adapt to both internal and external initiatives. This GAN in the post has no ability to step outside of the statistics in the training set unless the model is updated to prod it to do so. The model can be changed, but it is a forceful change. If you show me tens of thousands of images, I am not, at an emergent, top-level, system level, etc., bounded to the statistics of that image set. Is this GAN asked something or given a goal aside from an implicit "draw something like what we've given you"? Even if I do draw something like or akin to the given image set, I have full creative control over the image (assuming some drawing skill).
If the human brain (and really body) can be modeled via a statistical model (which is not yet known but is surmised as you imply), that doesn't necessarily explain high-level behaviors. More is different. You call it magic sauce, but others call it emergence. Our understanding of emergent behavior and complex systems at large is still in work.
In my view, metaphorical thinking, of which analogical thinking is a subset, is a likely kernel of human intelligence. While these statistical models are copying, which is similar in a way to analogy building, it's not quite there. The reason things it generates looks like other things is because it searched a parameter space for matching statistics. However, it cannot even explain that's why it generated what it did. We explain for it. These things are no more artificially intelligent than things like thermodynamics are naturally intelligent.
Lastly, as I pointed out in my original comment, if this is indeed creative as someone like you implies, the article fails to make a convincing argument and bounces around a lot of buzzwords.
> I find it very irritating that such shallow reasoning prevails amongst intelligent people.
I was offended, but I suppose I agree. ;)
People keep forgetting that you really can only fit to data you have. Extrapolation exists as a concept but the requisite intuitive knowledge needed in order to create something new that can be successful is hard to understand even by humans at this point (how much of the business press is full of gimmicky blogposts about how to be successful, full of contradictory anecdata, opinions, advices), I don't even know how AI researchers would go about tackling that. I am not an AI person but as a plain old scientist I know extrapolation without intuition is almost always a fraught effort.
Sure it can, but then it's not considered anime anymore. I think this sentiment is confusing genre with training set constraints. A new model is not required for some artist to do something different from anime or even an anime artist to do something different. Humans can self-adjust all with approximately the same base model (whatever that is).
I always thaught those were synonyms.
The sole definition of intelligence is under heavy load of reconsideration the last decades with the emergence of a better knowledge of animal cognition, for example.
I just don't see anything intelligent yet. People somehow have gotten confused with the success of machine learning being treated as intelligence. We have a lot of statistical models of things that are very successful, but those aren't considered intelligent. For reason, machine learning telling you something about data has suddenly been treated as AI. Machine learning can do some impressive things, but I think its short-sighted to equate AI and machine learning.
Intelligence is really a tough thing. Watching a video of even a single-cell organism displays a sort of intelligence and behavior far beyond anything I've seen of machine learning. So why is it intelligent? Or is it intelligent? I'm not entirely sure, but my point is that machine learning is orders of magnitude incapable of describing (i.e., modeling) even the simplest self-directed and self-adapting behavior that we see in the real world.
/s?
Or, through decades of AI research, we're just now starting to better understand what actual creativity really "is"?
Ah, make that three things the public shouldn't see being made: sausage, legislation, and waifus.
From urban dictionary:
"Waifu" is used to refer to a fictional girl or woman (usually in Anime, Manga, or video-games) that you have sexual attraction to, and you would even marry.
Huh.Imagine a future where people can compile written scripts into Hollywood quality movies.
Thanks so much! It's done by our fantastic animator[1]!
GANs are quite interesting and we didn't see many approachable explainer videos targeted at lay people, so we decided to make one ourselves!
One thing I was confused by: the video says the discriminator "AI" is trained to detect true vs. generated results, with the hope the generator becomes good enough to fool the discriminator. But why is the discriminator useful, then? Couldn't you just tell generator "AI" whether the result it produced was true or not?
I think the answer is.. you don't want just a perfect recreation of the training data you gave to the generator, instead you want the generator to produce variations of that training data, so there's a "how would you know if it's 'a true result' / good enough?" problem. So the discriminator is useful because it's not a direct comparison, but rather a "this looks approximately good enough" comparison of the true vs. generated result.
This all makes me wonder: what sort of data set needs to be fed to the discriminator to train it? Is it some sort of "true image" and "true image w/bad alterations (e.g. lines, scratches, etc.) to it" data set?
Indeed, it contributes to the variations problem.
also: If the discriminator starts off perfect, then the generator can't learn to be better.
Sort of like a human learning to play chess: If you start off with top-tier opponents that crush you, then you don't have a gradient to learn from. Instead, you need players at your own level to grow your skills.
Do you have a team page? How many of you are there? Do you work with gwern and nearcyan? Are you going to raise for this? (You should totally scale this!)
Great work, and keep it up!
Don't get me wrong, I have nothing against this, but I think we should start discusing morality of AI generated content, even if it doesn't train on existing artworks/code.
1. Simplifications of reality(the actual artist training method would be traditional studies off life and photo reference followed by gradual reduction and symbolization to a style)
2. Symbolic meaning. Things like the style of eyes, clothing, etc are all meant to signal personality. This is stuff that current AI techniques don't really touch upon in any direct sense.
Since the ML method is built on interpolating off final results, it's going to lack in these qualities and produce something that is consistently an "average impression". Akin to asking the algorithm to generate mythical heroes by mashing up the various stories: you get a hero that is somehow the average of Icarus, Heracles and Achilles, which would be less of a character than the originals.
Just a thought, I don't really know anything about ML.
I wonder if the OP's intuition regarding the sparseness of the latent space, and the relatively small area occupied by the 'useful' manifold? embedded within it provide us any clues as to what symbol grounding might look like for some neuro-symbolic infrastructure that sits atop that latent space.
I.e. how should we be trying to represent concepts like 'male' and 'female' within that space?
Is it important to have these concepts represented as a low dimensional manifold?
Is it important that this manifold be easily described by some simple geometric form like a convex polytope?
Is it important that nuances and variations on the concept be separable within the bounds of the concept-specific manifold?
What other properties might be important?
For TADNE, Arfafax ran Danbooru2019 and a few million TADNE samples through CLIP to get the image embeddings, and clustered them; when the two sets of clusters were graphed using tsne, you could see that the TADNE StyleGAN2-ext did a lot of mode-dropping in that many smaller outlying clusters of characters/franchises/topics simply did not appear in TADNE samples. The TADNE looked like a big galaxy, while Danbooru2019 looked more like it was surrounded by archipelagos. TADNE was extensively trained on them and was a very large model, but the GAN dynamics & StyleGAN architecture mean it didn't do a good job absorbing rarer/more idiosyncratic Danbooru2019 image-clusters.
I expect newer generative models which avoid GAN losses and which use more flexible (but expensive!) architectures, like DALL-E, would perform much better in terms of mode-dropping, so you'd see a lot more unique characters/images out of them. (I'm very excited about them. As good as TADNE or Waifu Labs v2 may be, I think they are still far behind what could be done with just existing data/arch/compute.)
Obvioisly there will be plenty of illustrators doing custom work that these can't (yet) replicate.
Also good for those countless anime avatar'd Twitter users.
FYI uBlock Origin complains about the registration link, because it on "Peter Lowe’s Ad and tracking server list".
If you're OK with being tracked, you can permanently allow that domain.
novel meaning user provided, not generated by the model or in the training set.
More things for, like, adults?
step 2: NFT all the things
step 3: profit
step 4: GOTO step 1
step 5: automate steps 1 to 4
It's an extremely hard research problem, because darker skin tones account for only about 0.3% of all anime art produced in the world.
We have employed an absolutely exhaustive array of art and data science tricks to give the model the ability to draw darker skin tones, though they are underrepresented. The results that you see today are the culmination of many months of careful tuning!
It's not definitely perfect, but from a data science perspective, this situation can't be rectified until the art world makes a shift.
Personally, I hope that more art representing dark skin tones will be created in the world!
It does not do well generating instances with features that are not well represented in the training dataset.
Compare this to human creativity. I suspect that fulfilling GPs request would be almost trivial for a human professional artist.
To be clear this is an amazing achievement, a creative use of the technology, and a positive contribution to the world. Pointing out limitations (i.e. areas with potential for future innovation) does not diminish it.
Ethnicity is sometimes incorporated, that is, some distinctions would be necessary if there was a documentary manga about a match in an Olympic Games played by teams from multiple parts of the world, and in that case an American players might be given smaller eyes or extra wrinkles in face, or African players might be colored darker than other characters, Chinese players could be drawn with slightly different shapes of chins, etc.
But the default is unspecified or an averaged, most simplified shapes and forms that the author uses in their own cognition.