This was spearheaded by Boris Dayma, now at Weights and Biases.
This is an Open Source project with all code and methods in public.
See either GitHub (https://github.com/borisdayma/dalle-mini) or the hosted space in Hugging Face Hub (https://huggingface.co/spaces/dalle-mini/dalle-mini) or the project report (https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini-G...).
This project was also covered in the NYT article on Dalle2 by Cade Metz.
The author gives no credits at all. That is apalling.
(Also, the one hosted in the HF Hub gives you better results)
I just realized that this person is either using our model (some point in the past) and not giving us due credit, or they trained a new model and the name just happens to match.
In the latter case, please ignore my rant and use my links as a reference to another project than the claim that this prpject is our project.
Unfortunately, despite the model authors adding a significant number of GPUs to the official demo, it has been hugged to death following recent Guardian, NYT, and other coverage.
So, a touch of Rain, Speed and Steam?
So I tried "a train speeding in rain" and got a somewhat car-like out of the window view on a rainy landscape, with a hint of rails somewhat mangled into what looked more like a road for automobiles to me. — However, no Turner… ;-)
a green bowl a green bowl with an apple a green bowl with an apple inside a banana in a bowl
the only one that seemed correct was "a green bowl", all of the others were very different.
This one tends to come back with blobby images that don't seem to take in at least half the words in the query (and yes, I'm only using 3-4 words, just like the example).
Out of the handful of DALL-E clones I've seen so far, this is by far the worst performing wrt results returned I've come across.
For out of four queries resulted in synthetic portraits that are terrifically scary.
Spent all evening yesterday having fun as me and my friends tried all sorts of inputs, including pretty specific/obscure ones (but we also did plenty of rather vague and generic inputs as well). Not even once we got unrecognizable blobs. Sometimes we got results that were more on the van gogh side rather than the realism side, but not even once we got something that was unrecognizable.
Even without knowing the prompt, you could tell "oh this looks like a human dressed in a suit with a weird body shape, as he is standing bent over, with a giant person in a dark dress from behind a desk across, everyone is from the waist up, all looking like a very stylized drawing", despite the prompt itself being something like "courtroom sketch of a man getting sued".
To anyone who wants to see something I personally found extremely interesting, try a prompt like "Google streetview of XYZ", with XYZ being something absolutely difficult to even imagine visually or vague, like "Godzilla" or "statue".
Then "samuel pepys on a horse" which gave me a disturbing ghost on a deformed horse.