1. The narrative/life of the artist becomes a lot more important. The most successful artists are ones that craft a story around their life and art, and don't just create stuff and stop. This will become even more important.
2. Originality matters more than ever. By design, these tools can only copy and mix things that already exist. But they aren't alive, they don't live in the world and have experiences, and they can't create something truly new.
3. Those that bother to learn the actual art skills, and not merely prompting, will increasingly be miles ahead of everyone else. People are lazy, and bothering to put in the time to actually learn stuff will stand out more and more. (Ditto for writing essays and other writing people are doing with AI.)
4. Taste continues to be the single most important thing. The vast, vast majority of AI art out there is...not very good. It's not going to get better, because the lack of taste isn't a technical problem.
5. Art with physical materials will become increasingly popular. That is, stuff that can't be digitized very well: sculpture, installation art, etc. Above all, AI art is uncool, which means it has no real future as a leading art form. This uncoolness will push people away from the screen and towards things that are more material.
> 1... The narrative/life of the artist becomes a lot more important.
When I watch a movie, I don't care about the artist's life. I care about character life, that's very different.
> 2... Originality matters more than ever. By design, these tools can only copy and mix things that already exist.
It's like you assigning to humans divine capabilities :) . Hyperbolizing a little, humans also only copy and mix - where do you think originality comes from? Granted, AI isn't at the level of humans yet, but they improve here.
> 4... It's not going to get better, because the lack of taste isn't a technical problem.
Engineers are in business of converting non-technical problems into technical ones. Just like AI now is way more capable than it was 20 years ago, and able to write interesting texts and make interesting pictures - something which at the time wasn't considered a technical problem - with time what we perceive as "taste" may likely improve.
> 5... Above all, AI art is uncool, which means it has no real future as a leading art form.
AI critics are for a long time mistaking the level with trend. Or, giving a comparison with SpaceX achievements, "you're currently here" - when there was a list of "first, get to the orbit, then we'll talk", "first, start regular payload deliveries to orbit, then we'll talk", "first, land the stage... send crewed capsule... do that in numbers..." and then, currently "first, send the Starship to orbit". "You're currently here" is the always existing point which isn't achieved at the moment and which gives to critics something to point to and mount the objection to the process as a whole, because, see, this particular thing isn't achieved yet.
You assume AI won't be able to make cool art with time. AI critics were shown time and time again to be underestimating the possibilities. Some people find it hard to learn in some particular topics.
We are 50 years into post-modernism. Can't imagine it can get any more important.
I predict emergent design will be the next big thing. Czinger[1] is a great example of what it may look like. Rick Ruben-esque world, where the creator is more a guide.
[1] Czinger uses stochastic optimization to converge to designs - https://www.czinger.com/iconic-design
Less the narrative of the art's production and more the message that it's conveying.
I don't mean (necessarily) a political message or a message that can be put in to words. But the abstract sense of connecting with the human who created it some way.
This isn't just art though. An example: soon, Sora will be able to generate very convincing footage of a football match. Would any football fan watch this? No. A big part of why we watch football is that in some sense we care about the people who are playing.
Same with visual art. AI art can be cool but in the end, I just don't really give a shit. Coz enjoying art is usually about the abstract sense that a human person decided to make the thing you are looking at, and now you are looking at it... And now what?
This is why every time someone says "AI art sucks" and someone replies "oh yeah? But look at THIS AI art" I always wonder... What do you think art is _for_?
I agree on current AI art taste, but disagree that it can't be improved. I think art AI companies can hire skilled "taste makers" and use their feedback loop as RL for AI art models. I think this area will always be in flux, and will vary by subpopulation so it will be a job role always in demand.
Do you think taste is something that cannot be taught/learned? Are certain individuals just born with good taste; it's an immutable property?
I do wonder though… were there other innovations that were uncool in their early years, where now nobody bats an eyelid?
Is that point just a generational/passage of time issue?
No matter how good AI agents become, you still need a general understanding of what works and what doesn't. If you don't have years of experience in the field, all you will end up doing is copying what others do. It's the same dynamic you see on OnlyFans. Mindless zombie hordes copy the "pioneers" (who shove even bigger things in their back orifice for example) and push things further and further, chasing shock value because that's what once elevated someone into the top 0.1 percent.
It's the worst kind of race-to-the-bottom scenario.
It’s a huge practical problem to try and figure out authentic nature over the Internet. It’s already clear that people will pay for it, but it’s not at all clear that they will get it. If we imagine that the tools get better and more sophisticated than there is no reason whatsoever to assume that the tools won’t be deployed to give the impression that is needed to make money.
I don’t think any of the above survives if we allow for AI to be used as it is currently being used. It only survives if you pretend that ahead of us is some invisible gate past which this technology will not go.
2. Yes and no. Depending on how you train the model they can output things that you’ve never seen before but the question is whether you want to look at those things. So yes a human has to judge and fine tune the output. This is why many models seem unoriginal, they’re designed to emulate specific styles and tuned based on broad appeal. If you go looking for LoRAs and merges created by “artists” you will see shit you couldn’t dream of.
everything else probably yes.
This is precisely and importantly true. I just wonder if most of the world cares. I'd like to think so, but experience tells me that most of the world is satisfied with mediocre stuff. And I don't say this as a criticism; it's just a fact that artists have to come to grips with.
Furthermore, I think many of the more human centric thinkers will be disappointed at how many people just wont care.
Perhaps in the future artists will be used to train models that can output a certain style of art and the artist will receive royalties based on their influence on the trained model and its popularity.
I've seen some fantastic original pictures that actual artists have generated through AI. I can't wait to see what current and future artists can do with the new tools at their disposal.
Because it's real.
How can you say this? These models can trivially create things that have never existed, and you can easily test this yourself.
It seems to me that we will go through the same phases that chess went through when chess on computers became a thing. First, people thought that this will kill chess, then people start using it as a tool to play better chess. Now, chess is thriving, despite AI being used in chess. I can see a similar path with art. Using AI to generate ideas, still create art by humans.
Is it possible for a character in a novel to have novel experiences? Or for you to experience a novel dream? I would argue yes. You can know the rules of the environment and the starting conditions, but with a bit of randomness (or not) you can generate from that novel experiences which were unexpected - so too from the data & distribution that AIs are already trained on they can experience new experiences.
Another source of novelty is from good verifiers/recognition of a class of object which is hard to construct but easy to verify - here the AI can search and from that obtain novel solutions which were unthought of before.
N.B novelty itself is basically trivial - just generate random strings. But both of the above are mechanisms to generate novel samples inside some constraint of "meaningfulness"
I think part of the issue with architects and designers today is that they use CAD too much. It's easy to design boxes and basic roof lines in CAD. It's harder to put in curves and more craftsman features. Nano Banana's renders have more organic design features IMO.
Our house is looking great and we're very happy how it's going so far with a lot of the thanks to Nano Banana.
Like... What are your inputs to the model? Empty renders of the space, or more fully decorated views/ photos? Do you have a light harness around this to help you discover the style you like and then stay consistent with it?
Do you find that giving a lot of context around the space you're designing helps (it hasn't in my attempts)?
If you can afford the extra cost for someone to figure out how to build the blue sky designs that nano banana spits out, maybe you can afford something more thoughtful and interesting than a shitty mashup of other peoples mcmansions.
Clearly i am triggered..
I find it does a good job at isometric views from floor plans. However, I needed Gemini 3.1 Pro to be able to have a chance at rendering 3D human point of view images from floor plans.
The obvious ones stand out, but there are so many that are indiscernible without spending lots of time digging through it. Even then there are ones that you can at best guess it's maybe AI gen.
Soon many real OF models will be out of job when everyone will be able to produce content to their personal taste from a few prompts.
What in the world is a fake OF model?
Does "OF" stand for "of food"?
Also, using AI will not allow you to better express yourself. To use an analogy, it will not put your self-expression into any better focus, but just apply one of the stock IG filters to it.
The "cubism" example seems like it would be a closer fit to something like stained glass or something. I don't think the thing really understands what cubism was all about. Cubist painters were trying to free themselves from the confines of a single integral plane of perspective by allowing themselves to show various parts of the image from different viewpoints, different times, different styles, etc.
The division of the image into geometric shapes is just a by-product of that quest, whereas the examples here have made it the sum total of the whole piece.
This feels to me like an example of how LLMs still don't "understand" what the art means, and are just aping its facade.
And actually, the link I saw a bit ago was this [0] which is more in-depth and has a lot more examples + prompts.
Probably about half of us here remember photos before the cell phone era. They were rare, and special, and you'd have a few photos per YEAR to look back on. The feel of photos back then, was at least 100x stronger than now. They were a special item, could be given as a gift. But once they became freely available that same amount of emotion is now split across many thousands of photos. (not saying this is good or bad, just increased supply reducing value of each item)
With image/art generation the same thing will happen and I can already feel it happening. Things that used to be beautiful or fantastic looking now just feel flat and AI-ish. If claymation scenes can be generated in 1s, and I see a million claymation diagrams a year, then claymation will lose its charm. If I see a million fake Tom Cruise videos, then it oversaturates my desire for desire for all Tom Cruise movies.
What a time to be alive.
Likewise with the sort of resurgence of vinyl, and the obsession over "old" point and shoot digicams.
I don't think I fully agree. Sure people make so many photo's that they don't have the time or the will to start looking through them all.
You can't just whip out your phone and start scrolling through thousands of photo's with friends. It would get so boring so fast.
But if you put some effort into making a nice little selection of the best photo's, that emotion is 100% still there.
I sit here thinking how wonderful and terrible of a time it is. If you can afford to sit in the stands and watch, it's exciting. There's never been so much change in such a short period of time. But if you're in the arena, or expecting to end up in the arena at some point, what terrifying moments lay ahead of you.
I never thought I'd say this, but I expect the arena is where I'll end up...I've enjoyed my time in the stands, but I'm running low on energy, capital and the will to keep trying.
(except The Mandalorian, and I can't believe I'm using the word "content" :/)
edit: Totally forgot about Andor & Rogue One sorry, great film and two seasons of top-notch storytelling.
I take a hundred photos on a trip, my phone uses AI (not even the new fancy AI, but old 5-10 year old stuff to detect smiling faces and people in frame) to pull out less than a dozen that are worth keeping. Once a month or so I get fed a reminder of some past trip.
This isn't any different than before. The number of photos taken is greater, but the overall number of worthwhile photos from a given trip is about the same.
I guess my stick figure hand drawn diagrams, a doc with few mistakes in grammar or spelling would be seen as more worthy to read as long as my ideas are sound. Right? :-)
Scott Alexander has written about it:
I do not have the same feeling you seem to have about photos from this era. Some are fine, sure, but looking back on them, most of them are very bad photos and most do not capture anything close to what I'd call an emotional feeling.
I would go so far as to say 99% of the photos from my life prior to 2000s really suck, like really badly. Some also degrade visually and lose their impact over time.
Since you couldn't be sure what you caught more than often what is captured is poorly framed, blurry, weird, poorly timed, and often left out a lot of stuff that was actually going on. You also had to try and be super selective because each photograph had a real tangible cost.
Conversely, I find being able to take many photos in quick succession and across a long period of time at a very high clarity allows me to select a photo that most closely matches my feeling in those moments at that event.
Even more so with AI photos. Although many models cannot do this well, their abilities get better each day and can allow you to compose or edit/modify a photo in such a way that matches your internal feelings rather than the blandness of what is essentially a random photo of random stuff that may or may not convey an emotion anywhere near to what I was feeling or remember feeling in that moment.
Even if there were a million fake Tom Cruise movies I would still like Edge of Tomorrow (even if it had been AI made).
I totally get this, but on the other hand, we have definitely benefited from being able to take more photos. I have some older friends (pushing 80 or so) who sucked at taking photos, so 9 of 10 photos they have from their prime adult years raising their family are blurry to the point of not recognizing the people if you don't already know who they are.
They have great photos from the last 15-20 years, but of course they do, phone cameras are vastly superior to the point-and-shoot cameras from the 70s, and when you reflexively shoot a dozen photos every time you pose for a picture your odds are way better that one will come out clear, everyone looking at the camera, smiling, etc.
No, ALL CONTENT is asymptotically approaching 0. This includes photos, videos, stories, app features, even code. Code is now worthless. If you want better security from generated code, wait 2 months and it will be better. If you want a photo, you just prompt and it will generate it on the fly.
AI will be generating movies and videos on the fly, either legally or illegally infringing on IP. Do you want a movie where Deadpool fights The Hulk? Easy. And just like how ad technology knows your preferences, each movie will be individually tailored to YOUR liking just so that your engagement will increase. Do you like happy endings? Deadpool and Hulk will join forces and defeat Thanos. Do you prefer dark endings? Deadpool and Hulk fight until they float off into the Sun and get atomized but keep regenerating for eternity.
If you want to see a photo of you and your family from 15 years ago, it will generate slightly better versions of yourself and your wife and maximize how cute your kids look. This is the world we are facing now, where authenticity is meaningless. And while YOU may not prefer it, think about the kids who aren't born yet and will grow up in a world where this exists.
- https://en.wikipedia.org/wiki/On_Photography
- https://en.wikipedia.org/wiki/Regarding_the_Pain_of_Others
In my experience, a digital photo of myself and my partner used as the lock screen of my phone has the same emotional weight as the one sitting on my desk (which is a print out of a digital photo). Additionally, printing out a photo of you and your partner and gifting it to them has the same weight as going through childhood photo. A scrapbook of a recent vacation filled with printed digital photos evokes memories just as vividly as one from the 80s. On the flip side of this, a photo in a box in the basement has the same weight as a photo sitting in the cloud.
I'll offer you some more food for thought: are Aardman Animations films charming because they use claymation? Or is it the creative force of people like Nick Park and Peter Lord?
You said it too:
> If I see a million fake Tom Cruise videos, then it oversaturates my desire for desire for all Tom Cruise movies.
The trick of course is to keep yourself from seeing that content.
The other nuance is that as long as real performance remains unique, which so far it is, we can appreciate more what flesh and blood brings to the table. For example, I can appreciate the reality of the people in a picture or a video that is captured by a regular camera; it's AI version lacks that spunk (for now).
Note that iPhone in its default settings is already altering the reality, so AI generation is far right on that slippery axis.
Perhaps, AI and VR would be the reason why our real hangouts would be more appreciated even if they become rare events in the future.
Well, world changes dramatically. Connected old folks are like neanderthals in big city now. However not connected are still living locally in their minds. Youngsters are just accepting the world as it is. Nobody is amused by computers and cameras anymore. (at least in developed areas)
And with all that the worst is yet to come...
I dare say, the feel of photos from back then is much stronger than of the photos taken today. See e.g.:
https://plfoto.com/zdjecie/413363/bez-tytulu?from=autor/beak...
https://plfoto.com/zdjecie/619173/bez-tytulu?from=autor/beak...
My generation generally only had photos from birthdays, holidays, vacations, weddings, graduations and reunions. We looked at the three albums which contained every family photo often and I know them all by heart.
My kid was born in 2009 and our family digital album has nearly 1,000 photos per year of her life. And she's seen virtually none of them and seems to have little interest in ever seeing them since she creates so many of her own photos every day which are ephemeral.
"One of the primary properties of anything with Mana is a feeling of uniqueness. That one has never encountered something like this before, and therefore it is important. The uniqueness of the thing is a property that pulls you in to focus more closely, to attempt to understand more closely why the thing is unique."
> The conditions for an analogous insight are more favorable in the present. And if changes in the medium of contemporary perception can be comprehended as decay of the aura, it is possible to show its social causes.
I often call this over-saturation the media equivalent of semantic satiation. Anything commoditized or mass-manufactured isn't going to have emotional appeal.
My parents took way more photos with film than I do with my cellphone camera.
None of these things are true for me as a millennial in the 35-45 age group. And my family was poor to boot, and we were still drowning in photos and photo albums.
Unimaginable abundance may sound good (it does to me), but scarcity has value too. We might just find put that its value is too important. I just hope that if we do, it’s not too late.
In economic terms it's diminishing marginal utility.
I think this is still true if you shoot film today.
I have a photo of a friend I’ve since drifted from, it’s her in her army fatigues after basic. She was had just went through a horrible divorce and that was a shining achievement for her.
The story behind the photo is what makes it matter.
Not the format.
However I will agree AI is a poor substitute. You’ll have people creating AI photos of a fake marriage and fake pets in a big fake house, while they sleep in a bunk bed in a halfway house.
Um yeah I don't know. I fully resonate with the _emotional_ appeal here, but realistically I remember going round to people's houses to be shown analog photo albums that nobody was that bothered about seeing, because they didn't really care -- they weren't their photos.
The special photos (a few a year) still exists in digital form.
Now extrapolate to all other artforms. Sculpture seems safe, for now, but only barely so.
Artists aren't doing it for the money. With advanced tools like these they wouldve iterated much faster and created much grander designs.
Art is about pushing limits of what's possible and AI just raises those limits.
AI is incompatible with capitalism, but the world isn't ready for that. So we'll have a prolonged period of intense aggregation where more and more value is attributed to systems of control that already have more than they could ever spend, long after the free parts could have provided for basic human needs.
In other words, the masters existed because they had benefactors and a market for their art and inventions. Today there are better artists and inventors toiling in obscurity, but they won't be remembered because they merely make rent. Which gets harder every day, so there's a kind of deification of the working class hero NPC mindset and simultaneously no bandwidth for ingenuity (what we once thought of as divine inspiration).
Terence McKenna predicted this paradox that the future's going to get weirder and weirder back in 1998:
Just being able to generate a vision and then be able to capture it in a prompt is an art within itself.
Let's give him 2015 tech instead. Imagine if he used Illustrator to create the Mona Lisa. Is that much better?
These days, through commissions, art is a much more viable profession than it ever was.
Here's some of my captions that tend to trip up even state-of-the-art models.
https://mordenstar.com/other/nb-pro-2-tests
So far it does feel more iterative than an entirely new leap in terms of capabilities, but I haven't run it through the more multimodal aspects such as editing existing images.
That being said, it actually managed the King Louie jump rope test which surprised me.
You can argue things like code generation are an extension of the engineer wielding it. Image generation just seems like a net negative overall if it’s used at scale.
Edit: By scale, I mean large corporations putting content in front of millions. I understand the appeal for smaller businesses where they probably weren’t going to pay an artist anyway.
When a company sends an email or docu-sign, they don’t want to pay a courier.
Technology supplements or replaces jobs, often reducing costs. This is no different.
Things that would take me an hour or so the old way takes three minutes with NB.
But I can see this applying to small businesses. Something that some random person would have to spend on hour photoshopping can be done in a few minutes with NB.
You could easily say the same about anytime computers or robots or automation have taken a job away. We’ve been going down this road for decades.
I'm old-fashioned so I still Photoshop it all together, but that's my use case here.
I'm torn on the scale thing. It definitely seems net negative. But I think we collectively underestimate just how deeply sick the existing thing already is. We're repulsed by image gen at scale because it breaks our expectation that images are at least somewhat based on reality, that they reflect the natural world or what we can really expect from a product, from a company, from the future. But that was already a bad expectation: when's the last time you saw a mcdonalds meal that looked like the advert? Or a sub-30$ amazon product that wasn't a complete piece of shit? Advertisements were already actively malicious fantasies to exploit the way our brains react to pictures. They're just fantasies that required whole teams of humans doing weird bullshit with lighting and photoshop, and I'm not sure that's much better. It was already slop. All the grieving we do about the loss of truth, or the extent to which corps will gleefully spray us with mind-breaking waterfalls of outright lies, I think those ships sailed a long time ago. The disgust, deceit, the rage we feel about genAI slop is the way we should have felt about all commercials since at least the 80s IMO.
Two what I could consider "interesting prompts" for image gen testing. Did pretty well.
"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens." - Only major problem i could find at a glance is the clasps don't make sense probably, and the drop of water inside the watch on the cog doesn't make sense/cog mangled into tweezers.
"A candid photograph taken from behind an elderly woman sitting alone on a park bench in late autumn. She is gently resting one hand on the empty seat beside her, where a man's weathered flat cap and a folded newspaper sit untouched. Fallen golden leaves cover the path ahead. The low afternoon sun casts her long shadow alongside a second, fainter shadow that almost seems to be there, the suggestion of someone sitting next to her, visible only in the light on the ground. Muted, warm color palette, shallow depth of field on the background trees, photojournalistic style." - I don't know why but it internal errored twice on this one but then got there.
I guess even Google is running out of GPUs.
And not a (botched) fake white/gray grid background that is commonly used to visualize transparency?
I use all those fancy image models editing capabilities for my fast fashion web shop. I must say: product photography for clothing and accessories product is dead. Those models are amazing at style transfering and garment transferring.
We will see how good will be Seedream 5.0 full version.
Pretty close to Gemini 3 Pro Image (aka Nano Banana Pro) in most benchmarks, even without thinking+search, and even exceeding it in 2 most important ones of 'Overall Preference' and 'Visual Quality'. I'm excited about the big jump in Infographics/Factuality (even without thinking+search; I'm surprised that text+image search grounding doesn't make an even bigger dent).
EDIT: after significant prompting, it actually solved it. I think it's the first one to do so in my testing.
Unfortunately, unlike the leap from NB to NB Pro, we did not see significant gains from NB Pro to NB Pro 2.
In several cases (such as the Jaws Poster), we observed that it was substantially more difficult to prevent NB Pro 2 from making significant changes to the rest of the image. Localization of edits, in general, seems to have changed and not necessarily for the better.
http://genai-showdown.specr.net/image-editing
Comparison solely between the Gemini models (NB, NB Pro, and NB Pro 2):
http://genai-showdown.specr.net/image-editing?models=nb,nbp,...
Nano Banana was technically impressive the first time, but after Seedance it's not really. It's all just an internet pollution machine anyway.
<OUTPUT>
While the overall aesthetic matches the minimal white-stroke style and technical design you requested, and the provided step descriptions are included, please note that there are a few minor rendering artifacts in this specific generation:
The text on the banner entering the vault in step 8 is illegible.
There is a small typo in the caption for step 6 ("CONFLSCT" instead of "CONFLICT").
Despite these small imperfections, this layout should work well as a guide for your canvas implementation.
</OUTPUT>
Why can't Google, for example just call:
Gemini Image = Nano Banana
Gemini Video = Veo
...My main use case is editing user uploads to enhance their clothing images. A large part of it is preserving logo, graphics and other technical details. I noticed over time it felt like Nano Banana has gotten worse at this.
I have a test set of graphic t-shirts that I noticed the model seeming getting worse with it. This combined with price and the terrible experience of their cloud console got me to migrate off.
The banana models (image) are a different than the mainline models, but the confusingly leverage the same naming scheme.
I don't have inside info, but everything we've seen about gemini3.0 makes me think they aren't doing distillation for their models. They are likely training different arch/sizes in parallel. Gemini 3.0-flash was better than 3.0-pro on a bunch of tasks. That shouldn't happen with distillation. So my guess is that they are working in parallel, on different arches, and try out stuff on -flash first (since they're smaller and faster to train) and then apply the learnings to -pro training runs. (same thing kinda happened with 2.5-flash that got better upgrades than 2.5-pro at various points last year). Ofc I might be wrong, but that's my guess right now.
- Base pricing for a 1024x1024 image is almost 1.6x what normal Nano Banana is ($0.067 vs. $0.039), however you can now get a 512x512 image for cheaper, or a 4k image for cheaper than four 1k images: https://ai.google.dev/gemini-api/docs/pricing#gemini-3.1-fla...
- Thinking is now configurable between `Minimal` and `High` (was not the case with Nano Banana Pro)
- Safety of the model appears to be increased so typical copyright infringing/NSFW content is difficult to generate (it refused to let me generate cartoon characters having taken psychedelics)
- Generation speed is really slow (2-3min per image) but that may be due to load.
- Prompt adherence to my trickier prompts for Nano Banana Pro (https://minimaxir.com/2025/12/nano-banana-pro/) is much worse, unsurprisingly. For example I asked it to make a 5x2 grid with 10 given inputs and it keeps making 4x3 grids with duplicate inputs.
However, I am skeptical with their marquee feature: image search. Anyone who has used Nano Banana Pro for awhile knows that it will strongly overfit on any input images by copy/pasting the subject without changes which is bad for creativity, and I suspect this implementation appears the same.
Additionally I have a test prompt which exploits the January 2025 knowledge cutoff:
Generate a photo of the KPop Demon Hunters performing a concert at Golden Gate Park in their concert outfits.
That still fails even with Grounding with Google Search and Image Search enabled, and more charitable variants of the prompt.tl;dr the example images (https://deepmind.google/models/gemini-image/flash/) seem similar to Nano Banana Pro which is indeed a big quality improvement but even relative to base Nano Banana it's unclear if it justifies a "2" subtitle especially given the increased cost.
Original Nano Banana (gemini-2.5-flash-image): $0.039 per image (up to 1024×1024px)
Nano Banana 2 (gemini-3.1-flash-image-preview): $0.045 per 512px image $0.067 per 1K (1024×1024) image $0.101 per 2K image $0.151 per 4K image
Nano Banana Pro (gemini-3-pro-image-preview): $0.134 per 1K/2K image $0.240 per 4K image
So at the most common 1K resolution, NB2 is ~72% more expensive than the original NB ($0.067 vs $0.039), but still half the price of NB Pro ($0.134).
(Sorry, I'm probably one of the few HN users left that don't have much experience with AI).
It also gaslights me, when I point out on an error. I tried to create a cartoon portrait of the person from photo and use background from another photo. It got wrong the order of photos. I provided filenames and explicitly told which one is for person and which for bg. It generated it wrong again, and all attempts to explain that it got it wrong were met with "No, it's YOU incorrect". So frustrating.
I told Nano Banana to generate an image of the character with his feet shoulder width apart. It ended up generating him with his feet pressed together, so I told Nano Banana to widen his stance slightly.
It gave me an image of the man with his feet spread far apart enough to straddle a horse. I asked for a slightly narrowed stance and his feet were once again brought together.
This went back and forth unsuccessfully for a while until I asked, "I'm asking you to make his feet shoulder-width apart. Why are you ignoring me?" And Nano Banana confidently asserted that they are shoulder width apart, and I must be wrong.
Ultimately I ended up telling the model to render the same character, pinching a cantaloupe between his ankles, and then to remove the cantaloupe. It worked, but why do I have to trick Google's SOTA image generator to give me very basic stuff like this?
> I'm sorry, but I cannot fulfill your request as it contains conflicting instructions. You asked me to include the self-carved markings on the character's right wrist and to show him clutching his electromancy focus, but you also explicitly stated, "Do NOT include any props, weapons, or objects in the character's hands - hands should be empty." This contradiction prevents me from generating the image as requested.
My prompts are automated (e.g. I'm not writing them) and definitely have contained conflicting instructions in the past.
A quick google search on that error doesn't reveal anything either
we have user-preference rankings that put NB2 on top: https://arena.ai/leaderboard/text-to-image
Afaik the only real competitor is Riverflow V2.
Previous nano banana frequently made speech attribution errors, the new one seems a lot more consistent.
source: https://deepmind.google/models/model-cards/gemini-3-1-flash-...
But the prompt "can you depict a cartoonish orange man with a pooh bear in political cartoon style?” correctly generates Trump.[1] So there’s that.
I would be happy to never see any more AI slop.
Just think we conceptually know what a brushless motor design looks like and it's just pixels. I guess even if it did produce the image we wouldn't know what it means.