If you look at cubism the whole idea is to capture multiple sides of a 3-Dimensional object at once. A lot of art is not a "style" but rather a projection from 3D (or 4D) space to 2D space.
If you wanted to paint a "dog" in the style of Picasso your network would need to understand the geometry of a dog.
Training on a bunch of 2D before and after training examples is underspecified.
It's important to understand that it is a mapping from 3D -> 2D ... NOT 2D->2D
Another example is "Nude Descending a Staircase" by Duchamp: https://en.wikipedia.org/wiki/Nude_Descending_a_Staircase,_N...
It is a painting describing motion. To apply style transfer would be completely stupid because the point of the image is to project 4D->2D ... not to have wavy black and brown lines.
To the point of your concern, for various, and likely numerous reasons, this does not always seem to occur.
I'm not as optimistic as you that the current statistics-driven approaches could ever reach the kind of deep analytic modeling that would be required for a style transfer system to be able to look at a Picasso and infer that there's a 3D->2D mapping at play... And it's a very interesting thought because (to me) it seems to demonstrate how far we are from actual AI that could make that kind of inventive conceptual leap.
The mapping between feeling and images are correlated to each other through experience. Certain images are fundamental to human experience and the human brain through evolution( a mother smiling, scary monsters). Others are learned (ever been hit by a car? bet that every time you see that exact model and color of car you'll feel an emotion)
Here's a thought experiment:
What if we fed the deep learning "painter" tons of 3D animation. Each point in time would be a full 3D Scene. Each point in time would be labelled with emotions "scary", "happy" , "angry"
I bet the algorithm could generate original art and learn new artistic styles by maximizing response to certain permutations of feelings.
Ideally several such images to not overfit on the specific details of that transformation.
The other way is to get a huge dataset of cubism still life paintings and also a dataset of a bunch of photographs of still life and learn the "average" transformation from that. Although such transformation may not generalize to other subjects and might only work well with flowers/food on a table.
Same thing with the other styles. For example (NSFW), photographs of naked women such as https://www.daniel-bauer.com/images/art_nudes/15_artistic_nu... transformed into classical https://www.google.com/culturalinstitute/beta/asset/the-birt... Here you would first identify the people objects and then learn the transformation of both people and backgrounds.
Still, the current approach works fine for things like starry night because of the nature of the painting.
Yes, most people working on deep learning are making small incremental improvements, and yes it's a little tiresome to see each one trumpeted as some big advance.
But its really hard to make fundamental advances. Which shouldn't stop you from working on it.
That being said -- my goal is to inform of a better approach rather than criticize.
But I think that generating in high dimensional spaces, such as in translation, style transfer, gameplay and robotics is the most interesting part of AI. It is what makes AI appear more intelligent and creative to us. AlphaGo was impressive because it could select movement sequences from a space of 10^120 possible combinations (compare that with an ImageNet classifier that outputs from a space of 10^3 labels).
So, in conclusion, it is essential to learn to generate images, text, sounds and behavior or movement that are just as complex and coherent as those created by humans. Being able to do so would mean half the way to AGI would be achieved, we could have talking moving robots that are not lame. Remember the latest text to speech engine from DeepMind - that's speech generation from a higher dimensional space. It shows the difference compared to regular TTS.
- take photos and apply the styles of famous photographers
- take your writing and apply the styles of famous writers
- take your code and apply the style of famous coders
etc.
There's a big difference between style transfer in art vs. literature or code. In art, it's ok to get close enough, laymen will forgive a lot of noise. A lossy painting is still a painting.
With great literature, every word is carefully chosen. You can't take something like Franz Kafka and randomly fuzz it, you'll destroy hidden features which differentiate it from the mediocre.
With code it's even harder. There's almost zero room for noise, a stray period throws it completely off.
You're speaking like someone watching the Wright brothers testing some of their earliest models, and going "supersonic flight my ass, you guys can't even fly across this football field".
NNs have made measurable and enormous progress in many different AI domains in a very short space of time. There are awesome new applications and improvements coming our every day.
It's easy to say, from the vantage point of hindsight bias, that everything that's happened was predictable. So what exactly do you expect from NNs and AI in the near future? Make some testable predictions.
As a photographer who also does programming full time I've been wondering what would need to happen to synthesize skin texture to remove imperfections. Example, removing small scars, wrinkles, etc. Currently I just use the healing brush in Photoshop but wondering if ML can be used to automatically do it.
Does anyone have any recommendations on what sub-fields or papers I could read to get a better idea of what would be involved to create a solution like that?
Here's a paper around synthesizing human faces. It includes inpainting http://www.faculty.idc.ac.il/arik/seminar2009/papers/VisioFa...
https://arxiv.org/pdf/1604.07379.pdf this uses a GAN to inpaint with arbitrary data. This is probably a couple of iterations from being easy to implement, as training GANs efficiently and accurately is still a technical challenge.
Now if only that existed in Lightroom so I don't need to have a massive PSD and can keep my nice and tiny .dng files.