congratulations to liuliu on the launch!
I did have trouble closing the "adjustments" dialog (upper-right button) due to its close button being underneath the status bar, but found that I could just drag the dialog down to the bottom and it closed.
Do you plan to make a macOS version of your app also? Hope you will :)
All apps used to be like this, and now the ones that actually respect user privacy are a rare and glorious exception. Thank you!
I just used the prompt "A person looking at their phone in amazement" and got a good picture.
Beware that on startup the app downloads almost 2 gig of data.
Iphone battery health reports a battery at 100% health. This is an iphone SE3.
Amazing how huge the difference in energy consumption is for the system in standby vs going full throttle.
EDIT: I generated 3 more images; every subsequent generation reduced battery capacity by another 2%. My phone doesn't seem to heat up at all, interestingly.
Q: How long does a run typically take? 60 seconds?
> It took a minute to summon the picture on the latest and greatest iPhone 14 Pro, uses about 2GiB in-app memory, and requires you to download about 2GiB data to get started. Even though the app itself is rock solid, given these requirements, I would probably call it barely usable.
also
> Even if it took a minute to paint one image, now my Camera Roll is filled with drawings from this app. It is an addictive endeavor. More than that, I am getting better at it. If the face is cropped, now I know how to use the inpainting model to fill it in. If the inpainting model doesn’t do its job, you can always use a paint brush to paint it over and do an image-to-image generation again focused in that area.
Seems very worth a try. I'm downloading the model right now, it's going a bit slow, ~2MB/s.
Porting FlashAttention to Metal will be quite hard. Because for performance reasons, they did a lot of shenanigans to respect the memory hierarchy.
Thankfully, you can probably do something slower but more adapted to your memory constraints.
If you relax this need for performance and allow some re-computations, you can write a qkvatt function which takes q,k,v and a buffer to store the resulting attention, and compute without needing any extra memory.
The algorithm is still quadratic in time with respect to the attention horizon (although with a bigger constant (2x or 3x) due to the re computation). But it doesn't need any extra memory allocation which makes it easy to parallelize.
Alternatively you can use an O(attention horizon * number of thread in parallel) (like flash attention) extra memory buffer to avoid the re-computation.
Concerning the backward pass, that's the same thing, you don't need extra memory if you are willing to do some re-computation, or linear in attention horizon to not do re-computation.
One interesting thing to notice in the backward pass, is that it doesn't use the attn of the forward pass, so it doesn't need to be kept preserved (only need to preserve Q,K,V).
One little caveat of the backward pass (which you only need for training) is that it needs atomic_add to be easy to parallelize. This mean, it will be hard on Metal (afaik they don't have atomics for floats though they do have atomics for integer so you can probably use fixed points numbers).
Love the work, really great job!
You might like to look at the work HuggingFace has been doing (on non-iOS versions). They can run it in under 1GB RAM:
> If is also possible to chain it with attention slicing for minimal memory consumption, running it in as little as < 800mb of GPU vRAM
https://huggingface.co/docs/diffusers/optimization/fp16#offl...
There are some performance compressors like Blosch tuned for this:
https://www.blosc.org/pages/blosc-in-depth/
“Faster than memcpy” is the slogan.
The problem is that GPUs don't support virtual memory paging, so they can't read files nor decompress nor swap anything unless you write it yourself, which is a lot slower.
Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!
It is not as useful for this case (inference) because the activations holds long (UNet holds downsampling passes' activations and use that for upsampling) is not that much of a memory (in the range of a few megabytes). If it is for training, it is probably more useful.
On Apple platforms if you mmap a read-only file into the process address space, then it is "clean" memory. It is clean because the kernel can drop it at any time because it already exists on disk. You essentially can offload the memory management to the kernel page cache.
The downside is that if you run up to the limit and the "working set" can't fit entirely in memory, then you run into page faults which incur an I/O cost.
The advantage is that the kernel will drop the page cache before it considers killing your process to reclaim memory.
That said, I don't know the typical access patterns for neural network inference, so I don't know how the page faults would effect performance
Haha, awesome app!
I gave this and other available applications a try and I don’t understand what people see in ai image generation.
A simple prompt generated a person with 3 nostrils, 7 squished fingers, deformities everywhere I look, it just mashes a bunch of photographs together and generates abominations.
Pay close attention to generated models and you will find details which are simply wrong.
What is the use case that I’m missing?
Were they? https://en.wikipedia.org/wiki/Benz_Patent-Motorwagen - as fast as a carriage, about the same stink. Carriages clearly had a usecase.
Generated images now: take enormous energy to generate. Main current usecase is to gobble up more energy (mass media/entertainment).
Expansions like dream booth, which let you fine tune the system with your own submitted images are also quite amazing. Being able to give it just a few photos, and say things like "show me surfing in the ocean" and get a reasonable image back.
_Much_ more broadly, this space in AI/ML with GPT3/Dalle is exciting because it feels kind of like what the internet was made for. There's too much data on the internet for any one person to ever meaningfully process. But a machine can. And you can ask that machine questions. And instead of getting just a list of references back, you get an "answer". Image generation is the "image answer" part of this system. It's an exciting space because it feels like these systems will affect large chunks of how we use computers.
Here's a cool GPT3 "programming" example: https://twitter.com/goodside/status/1581805503897735168
And here are some of my dalle uses I've been impressed by, that I feel is publish-ready:
- https://labs.openai.com/s/nkOTLRWzjgQTe4QsgoWChP7n
- https://labs.openai.com/s/uSP55qRf1SqCbYTa2UDXXEfA
> What is the use case that I’m missing?
It's generating images from nothing more than a text description, a year ago that was something you'd only saw an StarTrek. Now it's real and we have barely scratched the surface of what is possible.
The images still need some manual work, but try to generate images of that quality and complexity by hand and you might have more appreciation how mindblowing it is that AI can not only do it, but do it in seconds.
This stuff was literally science fiction just a couple of years ago. Now you run it in your phone.
It is still extremely impressive and is improving every day.
Architecture is cool. It doesn’t do people well
Movie preparation, storytelling https://twitter.com/juliendorra/status/1590058518174134272 https://twitter.com/mrjonfinger/status/1590021753979670528
Fan art! https://twitter.com/rainisto/status/1581169461167816704 https://twitter.com/rainisto/status/1579474636202708993
Product shots and generative marketing https://twitter.com/dtcforeverr/status/1589916644939161600 https://twitter.com/kylebrussell/status/1590563734317338624
2D game assets, character design https://twitter.com/emmanuel_2m/status/1588249026272448512 https://twitter.com/elsableda/status/1562465392563351552
Imaginary selfies (self-portrait is a huge human use case!) https://twitter.com/stevenpargett/status/1590047241183821824 https://twitter.com/dh7net/status/1581298913637646336 https://twitter.com/fabianstelzer/status/1579818105672302592
Styling by example https://twitter.com/norod78/status/1590056501544386560
Raw sketch to final image https://twitter.com/nousr_/status/1564797121412210688
Editing in the most generic sense (replacing part of an image) https://twitter.com/bigblueboo/status/1585761916718383110
(Note that at that time, there is an implemention bug in inpainting model that caused the weirdness that I need to manually fix)
Were you focused on just making it work on the iPhone, or do you think you will keep adding functionalities to the app? Do you think it will ever be possible to train one's own model on an iPhone?
discovered they have stable diffusion 1.4,1.5, waifu diffusion(Anime), redshhift(3d model) and other models.
iPhone becomes warm after couple of runs and starts draining battery, so do it while connected to charger.
This is not a resolution setting but a crop setting.
And also can choose a model, steps, scale, and sampler!
Thank you for your great work!
the download restarts from 0% if the app is sent to the background, as there does not seem to be a download manager. This is especially problematic for the large 1.5gb file.
There are reports that iOS is not happy with how I computing SHA256 for downloaded model file by loading them all in memory for Xr (3GiB RAM). If this is happening for other devices, I may need to do streaming hash computing and put up a bugfix.
I wonder if it's the hardware or just the blockers that i use. Might be worth trying using blockers to see if it makes the general browsing experience better on Apple devices
I don’t recall giving “Draw Things” permissions to access my photo library, yet the app is able to save to my photo library without prompting and able to read existing images.
I may have misunderstood what permissions apps should ask for when saving to the photo library.
When save the photo, I only use UIImageWriteToSavedPhotosAlbum (https://developer.apple.com/documentation/uikit/1619125-uiim...) which asks you permission to write to the album, not read permission (they are separate). There are more things I can do if I have read permissions (like create a "Draw Things" collections and save to that, rather than save to generic Camera Roll). Ultimately decided to not do that because I don't want more permissions than I minimally absolutely need.
It transcribed as “image of unicorn poo ping” in tags :(