DALL-E Mini – Generate images from a text prompt (opens in new tab)

(app.baseten.co)

52 pointstuhins3y ago22 comments

22 comments

Wow, this author is very dishonest as it does not mention any of the people who created this project in the first place. I was one of the people who worked in this project.

This was spearheaded by Boris Dayma, now at Weights and Biases.

This is an Open Source project with all code and methods in public.

See either GitHub (https://github.com/borisdayma/dalle-mini) or the hosted space in Hugging Face Hub (https://huggingface.co/spaces/dalle-mini/dalle-mini) or the project report (https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini-G...).

This project was also covered in the NYT article on Dalle2 by Cade Metz.

The author gives no credits at all. That is apalling.

(Also, the one hosted in the HF Hub gives you better results)

I just realized that this person is either using our model (some point in the past) and not giving us due credit, or they trained a new model and the name just happens to match.

In the latter case, please ignore my rant and use my links as a reference to another project than the claim that this prpject is our project.

smcleod3y ago

This one seems really poor compared to the other minis I've tried. Mostly unrecognisable, blurred shapes

roseway43y ago

It’s likely the site is using the smaller “toy” model configured as default with the DALL-E mini code base. The larger “mega” model, used by the official demo, is far superior but requires significant GPU memory.

Unfortunately, despite the model authors adding a significant number of GPUs to the official demo, it has been hugged to death following recent Guardian, NYT, and other coverage.

nsonha3y ago

which ones have you tried?

blarneystone3y ago

I've seen this link passed around a lot (usually it reaches traffic limits) https://huggingface.co/spaces/dalle-mini/dalle-mini

smcleod3y ago

Yeah this one seemed to do a better job

wbraun3y ago

Are there different variants of DALL-E Mini? Running prompts through both this version and the one hosted on huggingface gives noticeably different results. The one on huggingface seems to give more accurate responses.

longtimelistnr3y ago

Yes there is a few different model variations is what I’ve heard

masswerk3y ago

Interesting results: I tried "a train entering a station" and "a train in the countryside". Both images showed a track with rails and some kind of distortion (somewhat reminiscent of speed, more so the first one), but no train, omitting the subject in favour of circumstances.

So, a touch of Rain, Speed and Steam?

So I tried "a train speeding in rain" and got a somewhat car-like out of the window view on a rainy landscape, with a hint of rails somewhat mangled into what looked more like a road for automobiles to me. — However, no Turner… ;-)

scottlawson3y ago

I tried

a green bowl a green bowl with an apple a green bowl with an apple inside a banana in a bowl

the only one that seemed correct was "a green bowl", all of the others were very different.

jerpint3y ago

How is this different from dall-e mini on huggingface?

clowd3y ago

That one errors out with "Too much traffic" and this one doesn't.

acherion3y ago

It may error out heaps of times, but requests that do make it through actually seem to come back with images that have considered more than just one or two words from the request.

This one tends to come back with blobby images that don't seem to take in at least half the words in the query (and yes, I'm only using 3-4 words, just like the example).

Out of the handful of DALL-E clones I've seen so far, this is by far the worst performing wrt results returned I've come across.

userbinator3y ago

The results are amusing but not particularly accurate; "cat" resulted in a recognisable but distorted cat, "dog" produced a barely recognisable nightmarish blob of fur and eyes, and "pig" output something with nothing more than the general texture of a pig.

ncr1003y ago

Check out the horror show that is "carrot top comedian".

For out of four queries resulted in synthetic portraits that are terrifically scary.

athorax3y ago

Mostly just getting unrecognizable blobs

filoleg3y ago

Which prompts have you tried? I have no idea which prompts to even input to get unrecognizable blobs intentionally.

Spent all evening yesterday having fun as me and my friends tried all sorts of inputs, including pretty specific/obscure ones (but we also did plenty of rather vague and generic inputs as well). Not even once we got unrecognizable blobs. Sometimes we got results that were more on the van gogh side rather than the realism side, but not even once we got something that was unrecognizable.

Even without knowing the prompt, you could tell "oh this looks like a human dressed in a suit with a weird body shape, as he is standing bent over, with a giant person in a dark dress from behind a desk across, everyone is from the waist up, all looking like a very stylized drawing", despite the prompt itself being something like "courtroom sketch of a man getting sued".

To anyone who wants to see something I personally found extremely interesting, try a prompt like "Google streetview of XYZ", with XYZ being something absolutely difficult to even imagine visually or vague, like "Godzilla" or "statue".

inetsee3y ago

I tried "a boy standing in front of a house", then "a girl standing in front of a house". Both results would make good surrealist paintings, but they weren't even close to what I would expect from the regular DALL-E images I've seen.

filoleg3y ago

Oh, if you were expecting something like the images produced by the original DALL-E, then I totally agree with you, DALL-E Mini doesn't get even close to that level.

My guess is that it is be due to the combination of it being a "DALL-E Lite" model overall and (most significantly) only running for ~150 iterations max (using the website linked in the OP). ~150 iterations is way too few iterations to get something approaching the type of images you've seen from the original DALL-E

athorax3y ago

One I tried was "raccoon riding bear" and it was just a horrific amalgamation of faces and fur

konart3y ago

"Tom and Jerry playing Contra"

emmelaich3y ago

I tried "samuel pepys on acid" which gave me a blurry Samuel Pepys portrait.

Then "samuel pepys on a horse" which gave me a disturbing ghost on a deformed horse.

j / k navigate · click thread line to collapse

22 comments

__rito__3y ago

Wow, this author is very dishonest as it does not mention any of the people who created this project in the first place. I was one of the people who worked in this project.

This was spearheaded by Boris Dayma, now at Weights and Biases.

This is an Open Source project with all code and methods in public.

This project was also covered in the NYT article on Dalle2 by Cade Metz.

The author gives no credits at all. That is apalling.

(Also, the one hosted in the HF Hub gives you better results)

I just realized that this person is either using our model (some point in the past) and not giving us due credit, or they trained a new model and the name just happens to match.

In the latter case, please ignore my rant and use my links as a reference to another project than the claim that this prpject is our project.

smcleod3y ago

This one seems really poor compared to the other minis I've tried. Mostly unrecognisable, blurred shapes

roseway43y ago

Unfortunately, despite the model authors adding a significant number of GPUs to the official demo, it has been hugged to death following recent Guardian, NYT, and other coverage.

nsonha3y ago

which ones have you tried?

blarneystone3y ago

I've seen this link passed around a lot (usually it reaches traffic limits) https://huggingface.co/spaces/dalle-mini/dalle-mini

smcleod3y ago

Yeah this one seemed to do a better job

wbraun3y ago

longtimelistnr3y ago

Yes there is a few different model variations is what I’ve heard

masswerk3y ago

So, a touch of Rain, Speed and Steam?

scottlawson3y ago

I tried

a green bowl a green bowl with an apple a green bowl with an apple inside a banana in a bowl

the only one that seemed correct was "a green bowl", all of the others were very different.

jerpint3y ago

How is this different from dall-e mini on huggingface?

clowd3y ago

That one errors out with "Too much traffic" and this one doesn't.

acherion3y ago

It may error out heaps of times, but requests that do make it through actually seem to come back with images that have considered more than just one or two words from the request.

This one tends to come back with blobby images that don't seem to take in at least half the words in the query (and yes, I'm only using 3-4 words, just like the example).

Out of the handful of DALL-E clones I've seen so far, this is by far the worst performing wrt results returned I've come across.

userbinator3y ago

ncr1003y ago

Check out the horror show that is "carrot top comedian".

For out of four queries resulted in synthetic portraits that are terrifically scary.

athorax3y ago

Mostly just getting unrecognizable blobs

filoleg3y ago

Which prompts have you tried? I have no idea which prompts to even input to get unrecognizable blobs intentionally.

inetsee3y ago

filoleg3y ago

Oh, if you were expecting something like the images produced by the original DALL-E, then I totally agree with you, DALL-E Mini doesn't get even close to that level.

athorax3y ago

One I tried was "raccoon riding bear" and it was just a horrific amalgamation of faces and fur

konart3y ago

"Tom and Jerry playing Contra"

emmelaich3y ago

I tried "samuel pepys on acid" which gave me a blurry Samuel Pepys portrait.

Then "samuel pepys on a horse" which gave me a disturbing ghost on a deformed horse.

j / k navigate · click thread line to collapse