Google Brain's Magenta: Multi-Style Image Transfer with Code (opens in new tab)

(magenta.tensorflow.org)

73 pointscinjon9y ago30 comments

30 comments

As an artist I find it very frustrating when people try to apply style transfer type techniques in an attempt to emulate an artist like Picasso. It kinda works and generates a bunch of hype but it isn't even close. The reason it's frustrating is because I think Deep Learning is actually capable of doing this stuff but the people implementing need to understand how Picasso actually did his work.

If you look at cubism the whole idea is to capture multiple sides of a 3-Dimensional object at once. A lot of art is not a "style" but rather a projection from 3D (or 4D) space to 2D space.

If you wanted to paint a "dog" in the style of Picasso your network would need to understand the geometry of a dog.

Training on a bunch of 2D before and after training examples is underspecified.

It's important to understand that it is a mapping from 3D -> 2D ... NOT 2D->2D

Another example is "Nude Descending a Staircase" by Duchamp: https://en.wikipedia.org/wiki/Nude_Descending_a_Staircase,_N...

It is a painting describing motion. To apply style transfer would be completely stupid because the point of the image is to project 4D->2D ... not to have wavy black and brown lines.

dmreedy9y ago

Unfortunately, I feel like it is the case that much current DNN work is predicated specifically around not understanding the problem, in a kind of Skinnerian rejection of GOFAI; the hope is that the signal in the data is strong enough that the statistical learning will 'understand' it for you, and all you need to worry about are tweaking hyperparamters until it clicks.

To the point of your concern, for various, and likely numerous reasons, this does not always seem to occur.

pavlov9y ago

To translate that kind of conceptual aesthetic logic into an algorithm, the programmer essentially needs to become the artist: make subjective creative decisions about the style to achieve, and enshrine those into code. And (as dmreedy wrote in a sibling comment) that's specifically the kind of "old-school" AI approach the current DNN-based work is trying to avoid.

I'm not as optimistic as you that the current statistics-driven approaches could ever reach the kind of deep analytic modeling that would be required for a style transfer system to be able to look at a Picasso and infer that there's a 3D->2D mapping at play... And it's a very interesting thought because (to me) it seems to demonstrate how far we are from actual AI that could make that kind of inventive conceptual leap.

salik_syed9y ago

What data does an artist consider when he paints? He does sort of an optimization procedure very similar to what something like Deep Dream does. But rather than doing response optimization to make random-noise more "dog-like" or "cat-like" or "human-like" (as Deep Dream does) the optimization is done to evoke a certain feeling within the artist himself. To create more extreme feelings than just a photo-realistic rendering.

The mapping between feeling and images are correlated to each other through experience. Certain images are fundamental to human experience and the human brain through evolution( a mother smiling, scary monsters). Others are learned (ever been hit by a car? bet that every time you see that exact model and color of car you'll feel an emotion)

Here's a thought experiment:

What if we fed the deep learning "painter" tons of 3D animation. Each point in time would be a full 3D Scene. Each point in time would be labelled with emotions "scary", "happy" , "angry"

I bet the algorithm could generate original art and learn new artistic styles by maximizing response to certain permutations of feelings.

1 more reply

lqdc139y ago

I think for cubism, the easiest way would need to create the "original" painting/image before cubism transformation and learn the transformation.

Ideally several such images to not overfit on the specific details of that transformation.

The other way is to get a huge dataset of cubism still life paintings and also a dataset of a bunch of photographs of still life and learn the "average" transformation from that. Although such transformation may not generalize to other subjects and might only work well with flowers/food on a table.

Same thing with the other styles. For example (NSFW), photographs of naked women such as https://www.daniel-bauer.com/images/art_nudes/15_artistic_nu... transformed into classical https://www.google.com/culturalinstitute/beta/asset/the-birt... Here you would first identify the people objects and then learn the transformation of both people and backgrounds.

Still, the current approach works fine for things like starry night because of the nature of the painting.

paulsutter9y ago

So how about showing us a demo of what you mean? (from the playbook wherein "he who criticizes, volunteers")

Yes, most people working on deep learning are making small incremental improvements, and yes it's a little tiresome to see each one trumpeted as some big advance.

But its really hard to make fundamental advances. Which shouldn't stop you from working on it.

salik_syed9y ago

Agree that it is easy to be a critic ... I am indeed working on it :)

That being said -- my goal is to inform of a better approach rather than criticize.

gabipurcaru9y ago

Why is everyone working on style transfer? It doesn't seem like such an interesting problem in the field, compared to things like speech recognition for example. Is it just because it's a "cracked" problem and it looks nice? I'm just genuinely curious here, not trying to bash the amazing work these people do.

visarga9y ago

Style transfer is part of a new trend that is concerned with generation of content. It is very difficult to generate images or text because the space of possible shapes/messages is infinite and highly dimensional. We know how to classify in 1000 categories (which corresponds to generating tags from a set of 1000 choices) but when it comes to painting, it requires to select a combination of pixels from a much much higher dimensional space. Hence, the difficulty.

But I think that generating in high dimensional spaces, such as in translation, style transfer, gameplay and robotics is the most interesting part of AI. It is what makes AI appear more intelligent and creative to us. AlphaGo was impressive because it could select movement sequences from a space of 10^120 possible combinations (compare that with an ImageNet classifier that outputs from a space of 10^3 labels).

So, in conclusion, it is essential to learn to generate images, text, sounds and behavior or movement that are just as complex and coherent as those created by humans. Being able to do so would mean half the way to AGI would be achieved, we could have talking moving robots that are not lame. Remember the latest text to speech engine from DeepMind - that's speech generation from a higher dimensional space. It shows the difference compared to regular TTS.

joefkelley9y ago

I don't think anybody is taking it extremely seriously. For Google, it's PR. For individuals working on it, it's fun, interesting, and accessible.

feelix9y ago

Simply put, it's because apps like Prisma have demonstrated that there are 100's of millions of people that want this. So developers are following the market demand.

dorianm9y ago

If you can change the style of an image to anybody's style I guess you could:

- take photos and apply the styles of famous photographers

- take your writing and apply the styles of famous writers

- take your code and apply the style of famous coders

etc.

xamuel9y ago

I'd like you to be right, but I don't think you are.

There's a big difference between style transfer in art vs. literature or code. In art, it's ok to get close enough, laymen will forgive a lot of noise. A lossy painting is still a painting.

With great literature, every word is carefully chosen. You can't take something like Franz Kafka and randomly fuzz it, you'll destroy hidden features which differentiate it from the mediocre.

With code it's even harder. There's almost zero room for noise, a stray period throws it completely off.

1 more reply

bertiewhykovich9y ago

Because it's a way to avoid confronting the increasingly unavoidable fact that the AI renaissance DNNs were supposed to usher in is looking increasingly less impressive. Unsurprising, given that throwing more computing power at neural networks doesn't constitute a fundamental leap forward -- but disconcerting to a community that expected, and promised, far more than is being delivered.

Florin_Andrei9y ago

Hold on a second. We're still in the very, very early stages here. We haven't even started to connect those networks together to make hierarchies.

You're speaking like someone watching the Wright brothers testing some of their earliest models, and going "supersonic flight my ass, you guys can't even fly across this football field".

1 more reply

Houshalter9y ago

What exactly do you think was 'promised and expected'? Because from here it looks like deep learning has delivered an awful lot more than what anyone expected. No one expected it to beat Go. No one expected it to achieve human level results on problems like image recognition. And no one expected all this to happen in just a few years.

NNs have made measurable and enormous progress in many different AI domains in a very short space of time. There are awesome new applications and improvements coming our every day.

It's easy to say, from the vantage point of hindsight bias, that everything that's happened was predictable. So what exactly do you expect from NNs and AI in the near future? Make some testable predictions.

1 more reply

AlexCoventry9y ago

Right, making the best Go player in the world and cutting Google's power bill by 40% were huge yawns.

1 more reply

Teodolfo9y ago

Who promised the renaissance?! We should put them in the stocks and throw overripe fruit at them!

bcheung9y ago

Not sure how related this is but seems like the right crowd to ask...

As a photographer who also does programming full time I've been wondering what would need to happen to synthesize skin texture to remove imperfections. Example, removing small scars, wrinkles, etc. Currently I just use the healing brush in Photoshop but wondering if ML can be used to automatically do it.

Does anyone have any recommendations on what sub-fields or papers I could read to get a better idea of what would be involved to create a solution like that?

whataretensors9y ago

Sounds like a problem for inpainting.

Here's a paper around synthesizing human faces. It includes inpainting http://www.faculty.idc.ac.il/arik/seminar2009/papers/VisioFa...

https://arxiv.org/pdf/1604.07379.pdf this uses a GAN to inpaint with arbitrary data. This is probably a couple of iterations from being easy to implement, as training GANs efficiently and accurately is still a technical challenge.

shostack9y ago

Not fully automated with ML, but as a fellow photographer I think you'll find tutorials on frequency separation in Photoshop helpful and relevant.

Now if only that existed in Lightroom so I don't need to have a massive PSD and can keep my nice and tiny .dng files.

j / k navigate · click thread line to collapse

30 comments

salik_syed9y ago

If you look at cubism the whole idea is to capture multiple sides of a 3-Dimensional object at once. A lot of art is not a "style" but rather a projection from 3D (or 4D) space to 2D space.

If you wanted to paint a "dog" in the style of Picasso your network would need to understand the geometry of a dog.

Training on a bunch of 2D before and after training examples is underspecified.

It's important to understand that it is a mapping from 3D -> 2D ... NOT 2D->2D

Another example is "Nude Descending a Staircase" by Duchamp: https://en.wikipedia.org/wiki/Nude_Descending_a_Staircase,_N...

It is a painting describing motion. To apply style transfer would be completely stupid because the point of the image is to project 4D->2D ... not to have wavy black and brown lines.

dmreedy9y ago

To the point of your concern, for various, and likely numerous reasons, this does not always seem to occur.

pavlov9y ago

salik_syed9y ago

Here's a thought experiment:

What if we fed the deep learning "painter" tons of 3D animation. Each point in time would be a full 3D Scene. Each point in time would be labelled with emotions "scary", "happy" , "angry"

I bet the algorithm could generate original art and learn new artistic styles by maximizing response to certain permutations of feelings.

1 more reply

lqdc139y ago

I think for cubism, the easiest way would need to create the "original" painting/image before cubism transformation and learn the transformation.

Ideally several such images to not overfit on the specific details of that transformation.

Still, the current approach works fine for things like starry night because of the nature of the painting.

paulsutter9y ago

So how about showing us a demo of what you mean? (from the playbook wherein "he who criticizes, volunteers")

Yes, most people working on deep learning are making small incremental improvements, and yes it's a little tiresome to see each one trumpeted as some big advance.

But its really hard to make fundamental advances. Which shouldn't stop you from working on it.

salik_syed9y ago

Agree that it is easy to be a critic ... I am indeed working on it :)

That being said -- my goal is to inform of a better approach rather than criticize.

gabipurcaru9y ago

visarga9y ago

joefkelley9y ago

I don't think anybody is taking it extremely seriously. For Google, it's PR. For individuals working on it, it's fun, interesting, and accessible.

feelix9y ago

Simply put, it's because apps like Prisma have demonstrated that there are 100's of millions of people that want this. So developers are following the market demand.

dorianm9y ago

If you can change the style of an image to anybody's style I guess you could:

- take photos and apply the styles of famous photographers

- take your writing and apply the styles of famous writers

- take your code and apply the style of famous coders

etc.

xamuel9y ago

I'd like you to be right, but I don't think you are.

There's a big difference between style transfer in art vs. literature or code. In art, it's ok to get close enough, laymen will forgive a lot of noise. A lossy painting is still a painting.

With great literature, every word is carefully chosen. You can't take something like Franz Kafka and randomly fuzz it, you'll destroy hidden features which differentiate it from the mediocre.

With code it's even harder. There's almost zero room for noise, a stray period throws it completely off.

1 more reply

bertiewhykovich9y ago

Florin_Andrei9y ago

Hold on a second. We're still in the very, very early stages here. We haven't even started to connect those networks together to make hierarchies.

You're speaking like someone watching the Wright brothers testing some of their earliest models, and going "supersonic flight my ass, you guys can't even fly across this football field".

1 more reply

Houshalter9y ago

NNs have made measurable and enormous progress in many different AI domains in a very short space of time. There are awesome new applications and improvements coming our every day.

1 more reply

AlexCoventry9y ago

Right, making the best Go player in the world and cutting Google's power bill by 40% were huge yawns.

1 more reply

Teodolfo9y ago

Who promised the renaissance?! We should put them in the stocks and throw overripe fruit at them!

bcheung9y ago

Not sure how related this is but seems like the right crowd to ask...

Does anyone have any recommendations on what sub-fields or papers I could read to get a better idea of what would be involved to create a solution like that?

whataretensors9y ago

Sounds like a problem for inpainting.

Here's a paper around synthesizing human faces. It includes inpainting http://www.faculty.idc.ac.il/arik/seminar2009/papers/VisioFa...

shostack9y ago

Not fully automated with ML, but as a fellow photographer I think you'll find tutorials on frequency separation in Photoshop helpful and relevant.

Now if only that existed in Lightroom so I don't need to have a massive PSD and can keep my nice and tiny .dng files.

j / k navigate · click thread line to collapse