How Good Is DALL-E Mini at Origami? (opens in new tab)

(origami.kosmulski.org)

51 pointsmkosmul3y ago30 comments

30 comments

One related property of GPT-3: It's very bad at traditional computational tasks.

* "Make a list of 20 items" results in a list. The number of items is as accurate as if you asked a toddler the same question.

* If you ask GPT-3 a simple combinatorics question, it will be 100% confident in the wrong answer.

Origami is sort of the same. It takes a conceptual understanding of how paper folds, which DALL-E Mini doesn't have. It has a feel for the general origaminess of a picture.

If I showed a human being a few pieces of origami, including a paper crane, and they had never seen origami before, they'd likely result in similar pictures.

anon_123g9873y ago

Don't overestimate humans. Most people (adults, not toddlers) can't even draw a bicycle, even if they used one for most of their lives, so presumably they have a conceptual understanding of how it looks and works.

https://www.fastcompany.com/3059089/it-turns-out-its-almost-...

ossopite3y ago

This example gets trotted out a lot but I don't really understand it. Why do we assume cyclists have a conceptual mechanical understanding of a bicycle and can remember its exact appearance? If they build or repair bicycles, sure, but the majority of people don't do that. They just have learnt how to operate one by instinct.

riffraff3y ago

I think that's the point, the bike example shows people have some concept of what a bike is (pedals, wheels, frame) even without an exact understanding of it, or a precise image in mind.

It's the same for "draw a house". You may do a square facade, triangular roof, windows, chimney, door. It is equally unrealistic, because people don't have a clue how tall floors should be, have no sense of proportion etc..

It's just that in the bike case it's more obviously wrong.

blagie3y ago

I feel like we underestimate how close these systems are to sentience, largely by virtue of overestimating humans.

The largest ML systems of today have roughly the same complexity as human brains, and evolve in much the same way. The brain has 100 billion neurons, and GPT-3 has 175 billion parameters. Neurons and parameters aren't comparable, but there isn't an obvious advantage in either direction. Neurons have more parameters than ML parameters, but also operate at around 10Hz, versus many, many MHz.

That doesn't mean machine sentience will be anything like human sentience. Brain disorders are helpful to look at here -- there are people who don't experience specific emotions (e.g. pain, fear, etc.). Even a minor tweak can have a major impact. That's far less than, for example, evolving without evolutionary pressure for self-preservation, for pro-social behavior, or with the sort of ephemeral nature of ML systems.

aqfamnzc3y ago

Actual webpage for the artist's project: https://www.gianlucagimini.it/portfolio-item/velocipedia/

hansword3y ago

I know this is a little bit banal, but I feel like: (1) the author is thinking about "origami" (2) the model is only able to create "pictures of origami"

The model can only ever be trained on pictures of origami. Thus, the model can generate images that are getting close to "pictures of origami", but (as pictures necessarily are abstracted 2d projections) this might still be way way way off from "origami". Not knowing about actual origami, only ever having seen pictures, I thought most of the generated images were quite good. The actual experienced origami-folding person doesn't see it that way.

I hope my thought is phrased clearly enough, I am having trouble finding the right words here.

mysterydip3y ago

Semi related question for those more familiar with current AI capabilities: Has there been any attempt to "see" what dinosaurs looked like from their fossils? Using existing known animals and their skeletons as a training set.

thom3y ago

I don't mean this to sound overly negative, because I absolutely think DALL-E is a killer app amongst recent AI advances. But the thing that made DALL-E astonishing is that it was... good. While DALL-E Mini mimics a lot of the technical advances and you can kind of see what it's getting at with its outputs, they're still mostly garbage. Very clever garbage! But they lack the emotional impact that - woah! - this is doing something superhuman.

Obviously the hope is that somehow this and future advances can be democratised. It was funny that Asimov's The Last Question has been posted here a couple of times recently because it makes such a big thing about world-sized computers and how advanced minicomputers would be. It's easy to read and scoff at the naivety... before realising we could easily be heading back in that direction for many impactful future technologies.

dr_zoidberg3y ago

What makes DALLE Mini great is that we can all sit down and play with it, with no "oh this thing might destroy humanity" warning. A warning that most people who have worked seriously on different areas of AI find annoying for different reasons, but mainly it feels like a marketing gimmick to draw attention.

I have lots of friends who aren't related to the tech field having lots of fun playing with DALLE Mini, even though the results are terribly looking -- if they sort of resemble the prompt (and many times they do), they are ecstatic that the machine made a weird doodle about something ridiculous.

fortran773y ago

> What makes DALLE Mini great is that we can all sit down and play with it, with no "oh this thing might destroy humanity" warning

It seems like the DALL-E creators are mostly worried about the (possibly justified!) fear that people will use it to make racist or other offensive imagery, and it would bring very bad PR to the team.

Thorentis3y ago

Honestly, I thought the images generated were actually pretty good. The shadows of the paper folds, the types of folds typically used. It all felt "close enough" to be very impressive for an AI model.

codemonkey-zeta3y ago

Checked the model, and the "model card" https://huggingface.co/dalle-mini/dalle-mini#bias is an interesting exercise in sensitivity absurdity:

"Bias

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes."

Spoiler alert, nothing contained in that section requires a warning. It's just abstract descriptions of "potential" negative stereotypes in images.

"initial testing demonstrates that they may generate images that contain negative stereotypes against minoritized groups"

Minoritized is a new word for me. As though minority status is something actively attached to someone. But no duh I can ask dalle to generate "images of klan members at a lynching" or "inner city police brutality" and get negative images.

"When the model generates images with people in them, it tends to output people who we perceive to be white, while people of color are underrepresented."

I'd like to see real testing, because from what I can tell this is not true. Ask for "white people" and you get weird abstract models of white figures. Ask for "black people" and you get beautiful photos of smiling black faces.

Is this the kind of exercise AI researchers have to concern themselves with these days?

webmaven3y ago

The statement was copied wholesale from OpenAI, and the DALL•E mini implementation simply isn't as capable at depicting people, so the biases aren't as apparent. Yet.

hgarg3y ago

Just tried all of the prompts from the OP's post on OpenAI's DALL-E 2 - https://harishgarg.com/writing/generating-origami-images-usi...

DALL-E 2 beats Mini in almost all of them.

desro3y ago

Some of the issues seemingly stem from the model's either poor or mis-understanding of the input language... I wonder what a fusion of DALL-E + GPT3 or LaMBDA, where the text-based models perform prompt interpretations, would look like.

This may be a naïve thought as my understanding of all models mentioned is superficial at best.

l33tman3y ago

The text input comprehension is (supposedly) much better in Google's "DALL-E 2", https://imagen.research.google/

bambax3y ago

Slightly OT, although there might be some sort of connection with origami: does anyone know if DALL-E can produce vector images?

astrange3y ago

No, it's trained on pixels and it generates pixels. You could make a vector AI if you had millions of labeled vector inputs.

greyface-3y ago

GPT-3 already seems capable of generating SVG. I prompted it with:

  <?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg"

and it completed it to the following:

  <?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" id="cloud-sun_svg__-SimpleVector-0"><defs><linearGradient spreadMethod="pad" x2="1" y2="1"><stop offset="0" stop-color="#fff" stop-opacity=".2"/><stop offset="1" stop-opacity=".2"/></linearGradient></defs><path d="M5.5 13.5A2.5 2.5 0 008 16h8a2.5 2.5 0 002.5-2.5S19.5 10 17.5 10H9c-2.5 0-4.5 2-4.5 4.5zM10 11h.5a.5.5 0 100-1h-2a.5.5 0 100 1H10zm2 3.5h2a.5.5 0 100-1h-2a.5.5 0 100 1zm-4 2H8a.5.5 0 100 1h2a.5.5 0 100-1zm-2-2H8a.5.5 0 100 1h2a.5.5 0 100-1z" fill="#000" fill-opacity=".3"/><circle cx="12" cy="19" r="2" fill="#fff"/><path d="M11.5 19h-2c0-1.5 1-2.5 2.5-2.5S14 15.5 14 17h.5a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path d="M10 13.5h2a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path fill="url(#cloud-sun_svg__-SimpleVector-0)" d="M16 6l-5 5 5 5v-7c1.5 0 3 1 3 3v2z"/><path fill="#fff" d="M12.5 17h2v2h-2z"/></svg>

which looks like this: https://i.imgur.com/sHpv4Ii.png

dvh3y ago

Do androids dream of disemboweled cubist ducks?

astrange3y ago

Guinea pig wearing a hat?

https://i.imgur.com/Q8KWdAO.png

1 more reply

bambax3y ago

Yes, but that's just random SVG (xml); it would be amazing to be able to ask for specific shapes or silhouettes.

1 more reply

IYasha3y ago

This is damn scary! In a way that people might actually start using this technology (which does not really know what it's doing)...

xwdv3y ago

Is DALL-E publicly available or what? How do I work on generating images?

dr_zoidberg3y ago

DALLE-2 (from OpenAI) isn't, but DALLE Mini (from huggingface) is: https://huggingface.co/spaces/dalle-mini/dalle-mini

xwdv3y ago

When can we get our hands on DALLE-2?

dr_zoidberg3y ago

You can sign up for beta access in the meantime, but OpenAI hasn't given a timeline for access. I think eventually they'll open up some kind of playground like they did with GPT-3 and sell the service to other businesses.

Currently some prominent AI researchers, artists, and journalists have access to DALLE-2, and some have been taking requests from social media and posting the results, like https://twitter.com/hardmaru/, so you can check their feed for a few examples or give a few ideas when they ask again for ideas.

j / k navigate · click thread line to collapse

30 comments

blagie3y ago

One related property of GPT-3: It's very bad at traditional computational tasks.

* "Make a list of 20 items" results in a list. The number of items is as accurate as if you asked a toddler the same question.

* If you ask GPT-3 a simple combinatorics question, it will be 100% confident in the wrong answer.

Origami is sort of the same. It takes a conceptual understanding of how paper folds, which DALL-E Mini doesn't have. It has a feel for the general origaminess of a picture.

If I showed a human being a few pieces of origami, including a paper crane, and they had never seen origami before, they'd likely result in similar pictures.

anon_123g9873y ago

https://www.fastcompany.com/3059089/it-turns-out-its-almost-...

ossopite3y ago

riffraff3y ago

I think that's the point, the bike example shows people have some concept of what a bike is (pedals, wheels, frame) even without an exact understanding of it, or a precise image in mind.

It's just that in the bike case it's more obviously wrong.

blagie3y ago

I feel like we underestimate how close these systems are to sentience, largely by virtue of overestimating humans.

aqfamnzc3y ago

Actual webpage for the artist's project: https://www.gianlucagimini.it/portfolio-item/velocipedia/

hansword3y ago

I know this is a little bit banal, but I feel like: (1) the author is thinking about "origami" (2) the model is only able to create "pictures of origami"

I hope my thought is phrased clearly enough, I am having trouble finding the right words here.

mysterydip3y ago

thom3y ago

dr_zoidberg3y ago

fortran773y ago

> What makes DALLE Mini great is that we can all sit down and play with it, with no "oh this thing might destroy humanity" warning

Thorentis3y ago

codemonkey-zeta3y ago

Checked the model, and the "model card" https://huggingface.co/dalle-mini/dalle-mini#bias is an interesting exercise in sensitivity absurdity:

"Bias

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes."

Spoiler alert, nothing contained in that section requires a warning. It's just abstract descriptions of "potential" negative stereotypes in images.

"initial testing demonstrates that they may generate images that contain negative stereotypes against minoritized groups"

"When the model generates images with people in them, it tends to output people who we perceive to be white, while people of color are underrepresented."

Is this the kind of exercise AI researchers have to concern themselves with these days?

webmaven3y ago

The statement was copied wholesale from OpenAI, and the DALL•E mini implementation simply isn't as capable at depicting people, so the biases aren't as apparent. Yet.

hgarg3y ago

Just tried all of the prompts from the OP's post on OpenAI's DALL-E 2 - https://harishgarg.com/writing/generating-origami-images-usi...

DALL-E 2 beats Mini in almost all of them.

desro3y ago

This may be a naïve thought as my understanding of all models mentioned is superficial at best.

l33tman3y ago

The text input comprehension is (supposedly) much better in Google's "DALL-E 2", https://imagen.research.google/

bambax3y ago

Slightly OT, although there might be some sort of connection with origami: does anyone know if DALL-E can produce vector images?

astrange3y ago

No, it's trained on pixels and it generates pixels. You could make a vector AI if you had millions of labeled vector inputs.

greyface-3y ago

GPT-3 already seems capable of generating SVG. I prompted it with:

  <?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg"

and it completed it to the following:

  <?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" id="cloud-sun_svg__-SimpleVector-0"><defs><linearGradient spreadMethod="pad" x2="1" y2="1"><stop offset="0" stop-color="#fff" stop-opacity=".2"/><stop offset="1" stop-opacity=".2"/></linearGradient></defs><path d="M5.5 13.5A2.5 2.5 0 008 16h8a2.5 2.5 0 002.5-2.5S19.5 10 17.5 10H9c-2.5 0-4.5 2-4.5 4.5zM10 11h.5a.5.5 0 100-1h-2a.5.5 0 100 1H10zm2 3.5h2a.5.5 0 100-1h-2a.5.5 0 100 1zm-4 2H8a.5.5 0 100 1h2a.5.5 0 100-1zm-2-2H8a.5.5 0 100 1h2a.5.5 0 100-1z" fill="#000" fill-opacity=".3"/><circle cx="12" cy="19" r="2" fill="#fff"/><path d="M11.5 19h-2c0-1.5 1-2.5 2.5-2.5S14 15.5 14 17h.5a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path d="M10 13.5h2a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path fill="url(#cloud-sun_svg__-SimpleVector-0)" d="M16 6l-5 5 5 5v-7c1.5 0 3 1 3 3v2z"/><path fill="#fff" d="M12.5 17h2v2h-2z"/></svg>

which looks like this: https://i.imgur.com/sHpv4Ii.png

dvh3y ago

Do androids dream of disemboweled cubist ducks?

astrange3y ago

Guinea pig wearing a hat?

https://i.imgur.com/Q8KWdAO.png

1 more reply

bambax3y ago

Yes, but that's just random SVG (xml); it would be amazing to be able to ask for specific shapes or silhouettes.

1 more reply

IYasha3y ago

This is damn scary! In a way that people might actually start using this technology (which does not really know what it's doing)...