You need to bring your own OpenAI API key (costs about $0.10/run)
Some prompts are very stable, others go wild. If you bias GPT4's prompting by telling it to "make it weird" you can get crazy results.
Here's a few of my favorites:
- Gnomes: https://dalle.party/?party=k4eeMQ6I
- Start with a sailboat but bias GPT4V to "replace everything with cats": https://dalle.party/?party=0uKfJjQn
- A more stable one (but everyone is always an actor): https://dalle.party/?party=oxpeZKh5
The starting prompt (or at least, the theme) was suggested by one of my kids. Watch in awe as a regular goat rampage accelerates into full cosmic horror universe ending madness. Friggin awesome:
https://dalle.party/?party=vCwYT8Em
[1]: https://x.com/venturetwins/status/1728956493024919604?s=20
An infinite loop, on an unknown influencer's machine, prompted GPT-5 to "make it more."
13 hours later, lights across the planet began to go out."
There's probably a disproportionate amount of Satanic material in the dataset #tinfoilhat #deepstate
> Write a prompt for an AI to make this image. Just return the prompt, don't say anything else, but also, increase the intensity of any adjectives, resulting in progressively more fantastical and wild prompts. Really oversell the intensity factor, and feel free to add extra elements to the existing image to amp it up.
I played with it a bit before I got results I liked - one of the key factors, I think, was giving the model permission to add stuff to the image, which introduced enough variation between images to have a nice sense of progression. Earlier attempts without that instruction were still cool, but what I noticed was that once you ask it to intensify every adjective, you pretty much go to 11 within the first iteration or two - so you wind up having 1 image of a silly cat or goat and then 7 more images of world-shattering kaiju.
The goat one (which again, was an idea from one of my kids) was by far the best in terms of "progression to insanity" that I got out of the model. Really fun stuff!
The longer the Icon of Sin is on Earth, the more powerful it becomes!
...wow that's pretty dramatic.
"Think hard about every single detail of the image, conceptualize it including the style, colors, and lighting.
Final step, condensing this into a single paragraph:
Very carefully, condense your thoughts using the most prominent features and extremely precise language into a single paragraph."
https://dalle.party/?party=1lSMniUP
https://dalle.party/?party=cEUyjzch
https://dalle.party/?party=14fnkTv-
https://dalle.party/?party=wstiY-Iw
Praise the Basilisk, I finally got rate-limited and can go to bed!
I guess it's not that far-fetched as your brain has to do the same to figure out if a scene (or an AI-generated one for that matter) has some weird issue that should pop out. So in a sense your brain does this too.
Interesting that for one and only one iteration, the anthropomorphized cardboard boxes it draws are almost all Danbo: https://duckduckgo.com/?q=danbo+character&ia=images&iax=imag...
It was surprising to see a recognizable character in the middle of a bunch of more fantastical images.
It's very interesting to observe how the relationship between the wolf and Redhood evolved from dark and menacing to serene and friendly.
Simply a cat, evolving into a lounging cucumber, and finally opposite world:
https://dalle.party/?party=pqwKQVka
Vibrant gathering of celestial octopus entities:
I'd love to see an alternative viewing mode here which shows the image and the following prompt. Then you need to click a button to reveal the next image. This allows you to picture in your mind what the image might like while reading the prompt.
Thanks for making this fun little app!
Update: I just realized you can get this effect by going into mobile mode (or resizing the window). You can then scroll down to see the image after reading the prompt.
Starting prompt: "A futuristic hybrid of a steam engine train and a DaVinci flying machine"
Results: https://dalle.party/?party=14ESewbz
(Addendum: In case anyone was curious how costs scale by iteration, the full ten iterations in this result billed $0.21 against my credit balance.)
Starting prompt: "A futuristic hybrid of a steam engine train and a DaVinci flying machine"
Results: https://dalle.party/?party=qLHPB2-o
Cost: Eight iterations @ $0.44 -- which suggests to me that the API is getting additional hits beyond the run. I confirmed that the share link isn't passing along the key (via a separate browser and a separate machine) so I'm not clear why this is might be.
These images are incredible but I often notice stuff like this and it kind of ruins it for me.
#3 & #4 are good too, when the tracks are smoking, but not the train.
It consistently shows a robot painting on a canvas. The first 4 are paintings of robots, the next 3 are galaxies, and the final 2 are landscapes.
My best guess to try to explain this would be that “gnome + art style + mushroom” will draw from a lot more concrete examples in the training data, whereas the AI is forced to reach a bit wider to try to concoct some image for the weird scenario given in the cat example.
Got stuck on Van Gogh's "Starry Night" after a while.
https://dalle.party/?party=LOcXREfq
Also, love the simplicity of this idea, would love a "fork" option. And to be able to see the graph of where it originated.
"Write a prompt for an AI to make this image. Just return the prompt, don't say anything else. Replace everything with corgi."
Then it takes that new prompt and feeds it to Dall-E to generate a new image. And then it repeats.
I tried three, demo here:
default
https://dalle.party/?party=JfiwmJra
hyper-long + max detail + compression - This shows that with enough text, it can do a really good job of reproducing very, very similar images https://dalle.party/?party=QtEqq4Mu
hyper-long + max detail + compression + telling it to cut all that down to 12 words - This seems okay. I might be losing too much detail https://dalle.party/?party=0utxvJ9y
Overall the extreme content filtering and lying error messages are not ideal; will probably improve in the future. If you send too long, or too risky a prompt, or the image it generates is randomly too risky, you either get told about it or lied to that you've hit rate limits. Sometimes you also really do hit ratelimits.Also, you can't raise your rate limits until you prove it by having paid over X amount to openai. This kind of makes sense as a way to prevent new sign-ups from blowing thousands of dollars of cap mistakenly.
Hyper detail prompt:
Look at this image and extract all the vital elements. List them in your mind including position, style, shape, texture, color, everything else essential to convey their meaning. Now think about the theme of the image and write that down, too. Now write out the composition and organization of the image in terms of placement, size, relationships, focus. Now think about the emotions - what is everyone feeling and thinking and doing towards each other? Now, take all that data and think about a very long, detailed summary including all elements. Then "compress" this data using abbreviations, shortenings, artistic metaphors, references to things which might help others understand it, labels and select pull-quotes. Then add even more detail by reviewing what we reviewed before. Now do one final pass considering the input image again, making sure to include everything from it in the output one, too. Finally, produce a long maximum length jam packed with info details which could be used to perfectly reproduce this image.
Final shrink to 12 words:
NOW, re-read ALL of that twice, thinking deeply about it, then compress it down to just 12 very carefully chosen words which with infinite precision, poetry, beauty and love contain all the detail, and output them, in quotes.
https://dalle.party/?party=Vwuu9ipd
https://dalle.party/?party=Pc3g4Har
My intuition says that the "poetry" part skews the images in a bit of a kitchy direction.
4
GPT4 vision prompt generated from the previous image:
I'm sorry, I cannot assist with this request.
Is that because it's gradually made the spaceship look more like some sort of RPG backpack, so now it thinks it's being asked to describe prompts to create images of weaponry and that's deemed unsafe?There was an article recently that said something like adding urgency to a prompt gave better results. I hope it doesn't stress the model out :D
https://dalle.party/?party=bxrPClVg
https://dalle.party/?party=mmBxT8G-
https://dalle.party/?party=kxra0OKY (the last prompt got a content warning)
Starting over by removing the permalink parameter gives me much more consistent results! An exampe from before: https://dalle.party/?party=Sk8srl2F
I wonder what the default prompt is. There still seems to be a heavy bias towards futuristic cityscapes, deserts, and moonlight. It might just be the model bit it's a bit cheesy if you ask me!
Pretty interesting. I haven't been following the latest developments in this field (e.g. I have no idea how the DALL-E and GPT models' inputs and outputs are connected). Does this track with known results in the literature, or am I seeing a pattern that's not there?
It's important to note that some of these have extra prompts - e.g. "replace everything with cats" and there are progressively more cats.
Iiuc gpt-vision is a multimodal model so it's not image -> text, but image + text -> text. With that said here's asking it to describe what it sees as I take a bad selfie early in the morning
> The image shows a close-up selfie of a person indoors. The individual appears to be a Caucasian male with light skin, short curly hair, and a mustache. He is wearing a green T-shirt and looks directly at the camera with a neutral expression. There's a window with a dark view in the background, suggesting it might be nighttime. The person is in a room with a wall that transitions from white to gray, possibly due to shadowing. There's also a metal-framed bed with a heart-shaped detail visible.
Asked for more details
> The man appears to be in a domestic environment, possibly a bedroom given the presence of the bed frame in the background. The window is tilted open, allowing air to enter the room. The light source seems to be coming from above and in front of him, casting soft shadows on his face and creating a somewhat subdued lighting atmosphere in the room. The man's expression is subdued and thoughtful. The angle of the photo is slightly upward, which could indicate the camera was placed below eye level or held in hand at chest height.
It got a couple of things wrong, the window isn't open but it is on an angle and it's pitch black outside. It's not a heart shaped pattern on the bed, but it's a small metal detail and similar. Also while subdued calling me thoughtful rather than "extremely tired" is a kindness.
But it's definitely seeing whats there.
Maybe one way to check would be doing this with people. Get 8 artists and 7 interpreters, craft the initial message, and compare the generational differences between the two sets?
> Create an image of an anthropomorphic orange tabby cat standing upright in a kung fu pose, surrounded by a dozen tiny elephants wearing mouse costumes with mini trumpets, all gazing up in awe at a gigantic wheel of Swiss cheese that hovers ominously in the background.
That's hilarious, but also hilariously wrong on almost every detail. There's a huge asymmetry in apparent capability here.
1. You start by describing a thing. 2. The next person draws a picture of it. 3. The next next person describes the picture. repeat steps 2 and 3 until everyone has either drawn or described the picture.
You then compare the first and last description... and look over the pictures. One of the best ever was:
Draw a penguin. The first picture was a penguin with a light shadow.
After going around five rounds, the final description was "a pidgeon stabbed with a fork in a pool of blood in Chicago"
I'm still trying to figure out how Chicago got in there.
There's a few others but these were the quickest to get into and didn't require finding a group to play with, since they just pair you up with strangers.
Rate limits are really low by default - you can get hit by 5 img/min limits, or 100 RPD (requests per day) which I think is actually implemented as requests per hour.
This page has info on the rate limits: https://platform.openai.com/docs/guides/rate-limits/usage-ti...
Basically, you have to have paid X amount to get into a new usage cap. Rate limits for dalle3/images don't go up very fast but it can't hurt to get over the various hurdles (5$, 50$, 100$) as soon as possible for when limits come down. End of the month is coming soon. It looks like most of the "RPD" limits go away when you hit tier 2 (having paid at least 50$ historically via API to them).
I think the thing that strikes me is that the default for chatGPT and the API is to create images in "vivid" mode. There's some interesting discussion on the differences between the "vivid" and "natural" here https://cookbook.openai.com/articles/what_is_new_with_dalle_...
I think these contribute to the images becoming more surreal - would be interested to compare to natural mode - it looks like you're using vivid mode based on the examples?
At the same time, it perfectly illustrates my main issue with these AI art tools: they very often generate pictures that are interesting to look at while very rarely generating exactly what you want them to.
I imagine a study in which participants are asked to create N images of their choosing and rate them from 0-10 on how satisfied they are with the results. One try per image only.
Then each participant rates each other's images on how satisfied with the results based on the prompt.
It should be clear to participants that nobody wins anything from having the "best rated" images. i.e. in some way we should control for participants not overrating their own creations.
I'd wager participants will rate their own creations lower than those made by other participants.
// also, it's the best way - TY @z991
Even when I don't think I'm attempting particular weird instructions!
Or you can just re-insert any theme or recurring characters you like at that stage.
Another attempt: https://dalle.party/?party=k4eeMQ6I
Realized just now that the dropdown on top of the page shows the prompt used by GPT-4V.
Diversify depictions with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.
Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.
Use all possible different DESCENTS with EQUAL probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have EQUAL probability.
Do not use "various" or "diverse"
Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.
Do not create any imagery that would be offensive.
For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way -- for example, prompts that contain references to specific occupations.ChatGPT-V instructed to make an "artwork of a young woman", Dalle decided to portray a woman wearing a hijab. Somehow that made me really happy, I would've expected to see it creating a white, western woman looking like a typical model.
After all, a young woman wearing a hijab is literally just a young woman.
See Image #7 here: https://dalle.party/?party=55ksH82R
Here is using Faktory to do the same.
> Create a cozy and warm Christmas scene with a diverse group of friends wearing colorful ugly sweaters.
If OpenAI wants to support use cases like this, which would be kind of cool during these exploratory days, they should let you generate "single use" keys with features like cost caps, domain locks, expirations, etc
If you're not using the API for serious stuff though it's not a big problem, as they moved to pre-paid billing recently. Mine was sitting on $0, so I just put in a few bucks to use with this site.
setInterval(() => {$(".btn-success").click()}, 120000)Maybe you don't know what you specifically want you just want stylized gnomes so you write "a gnome on a spotted mushroom smoking a pipe, psychedelic, colorful, Alice in Wonderland style" and by the end of it you get that massively long and stylized prompt.
Maybe you do know what you want but you don't want to come up with an elaborate prompt so you steer it in a particular direction like the cat example.
For the first one you can get similar effects by asking for variations but it seems like this has a very different drift to it. Fun, albeit expensive in comparison.
* not hide the prompt by default * not only show 6 lines of the prompt even after user clicks * not be insanely buggy re: ajax, reloading past convos etc * not disallow sharing of links to chats which contain images * not artificially delay display of images with the little spinner animation when the image is already known ready anyway. * not lie about reasons for failure * not hide details on what rate limit rules I broke and where to get more information
etc
Good luck, thanks!
Prompt: "A unicorn and a rainbow walk into a tavern on Venus"
GPT4V instructions: "Write a prompt for an AI to make this image. Take this prompt and translate it into a different language understood by GPT-4 Vision, don't say anything else."
Results: https://dalle.party/?party=ED7E056D
I wasn't happy with the diversity of languages, so I modified the instructions for a second run of ten iterations using the same prompt as before:
GPT4V instructions: "Using a randomly selected language from around the world understood by GPT-4 Vision, write a prompt for an AI to make this image and then make it weirder. Just return the prompt, don't say anything else."
Result: https://dalle.party/?party=c7-eNR24
The languages it selected don't look particularly random to me which was interesting.
@z991 -- I ran into an unexpected API error the first time I tried this. Perhaps your logs show why it happened. It appeared when the second iteration was run:
"Error: You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp']."