* "Make a list of 20 items" results in a list. The number of items is as accurate as if you asked a toddler the same question.
* If you ask GPT-3 a simple combinatorics question, it will be 100% confident in the wrong answer.
Origami is sort of the same. It takes a conceptual understanding of how paper folds, which DALL-E Mini doesn't have. It has a feel for the general origaminess of a picture.
If I showed a human being a few pieces of origami, including a paper crane, and they had never seen origami before, they'd likely result in similar pictures.
https://www.fastcompany.com/3059089/it-turns-out-its-almost-...
The largest ML systems of today have roughly the same complexity as human brains, and evolve in much the same way. The brain has 100 billion neurons, and GPT-3 has 175 billion parameters. Neurons and parameters aren't comparable, but there isn't an obvious advantage in either direction. Neurons have more parameters than ML parameters, but also operate at around 10Hz, versus many, many MHz.
That doesn't mean machine sentience will be anything like human sentience. Brain disorders are helpful to look at here -- there are people who don't experience specific emotions (e.g. pain, fear, etc.). Even a minor tweak can have a major impact. That's far less than, for example, evolving without evolutionary pressure for self-preservation, for pro-social behavior, or with the sort of ephemeral nature of ML systems.
The model can only ever be trained on pictures of origami. Thus, the model can generate images that are getting close to "pictures of origami", but (as pictures necessarily are abstracted 2d projections) this might still be way way way off from "origami". Not knowing about actual origami, only ever having seen pictures, I thought most of the generated images were quite good. The actual experienced origami-folding person doesn't see it that way.
I hope my thought is phrased clearly enough, I am having trouble finding the right words here.
Obviously the hope is that somehow this and future advances can be democratised. It was funny that Asimov's The Last Question has been posted here a couple of times recently because it makes such a big thing about world-sized computers and how advanced minicomputers would be. It's easy to read and scoff at the naivety... before realising we could easily be heading back in that direction for many impactful future technologies.
I have lots of friends who aren't related to the tech field having lots of fun playing with DALLE Mini, even though the results are terribly looking -- if they sort of resemble the prompt (and many times they do), they are ecstatic that the machine made a weird doodle about something ridiculous.
It seems like the DALL-E creators are mostly worried about the (possibly justified!) fear that people will use it to make racist or other offensive imagery, and it would bring very bad PR to the team.
"Bias
CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes."
Spoiler alert, nothing contained in that section requires a warning. It's just abstract descriptions of "potential" negative stereotypes in images.
"initial testing demonstrates that they may generate images that contain negative stereotypes against minoritized groups"
Minoritized is a new word for me. As though minority status is something actively attached to someone. But no duh I can ask dalle to generate "images of klan members at a lynching" or "inner city police brutality" and get negative images.
"When the model generates images with people in them, it tends to output people who we perceive to be white, while people of color are underrepresented."
I'd like to see real testing, because from what I can tell this is not true. Ask for "white people" and you get weird abstract models of white figures. Ask for "black people" and you get beautiful photos of smiling black faces.
Is this the kind of exercise AI researchers have to concern themselves with these days?
DALL-E 2 beats Mini in almost all of them.
This may be a naïve thought as my understanding of all models mentioned is superficial at best.
<?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg"
and it completed it to the following: <?xml version="1.0" encoding="UTF-8"?><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 24 24" id="cloud-sun_svg__-SimpleVector-0"><defs><linearGradient spreadMethod="pad" x2="1" y2="1"><stop offset="0" stop-color="#fff" stop-opacity=".2"/><stop offset="1" stop-opacity=".2"/></linearGradient></defs><path d="M5.5 13.5A2.5 2.5 0 008 16h8a2.5 2.5 0 002.5-2.5S19.5 10 17.5 10H9c-2.5 0-4.5 2-4.5 4.5zM10 11h.5a.5.5 0 100-1h-2a.5.5 0 100 1H10zm2 3.5h2a.5.5 0 100-1h-2a.5.5 0 100 1zm-4 2H8a.5.5 0 100 1h2a.5.5 0 100-1zm-2-2H8a.5.5 0 100 1h2a.5.5 0 100-1z" fill="#000" fill-opacity=".3"/><circle cx="12" cy="19" r="2" fill="#fff"/><path d="M11.5 19h-2c0-1.5 1-2.5 2.5-2.5S14 15.5 14 17h.5a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path d="M10 13.5h2a.5.5 0 100-1h-2a.5.5 0 100 1z" fill="#fff"/><path fill="url(#cloud-sun_svg__-SimpleVector-0)" d="M16 6l-5 5 5 5v-7c1.5 0 3 1 3 3v2z"/><path fill="#fff" d="M12.5 17h2v2h-2z"/></svg>
which looks like this: https://i.imgur.com/sHpv4Ii.png