Stable Diffusion with Core ML on Apple Silicon (opens in new tab)

(machinelearning.apple.com)

723 points2bit3y ago178 comments

178 comments

How come you always have to install some version of pytorch or tensor flow to run these ml models? When I'm only doing inference shouldn't there be easier ways of doing that, with automatic hardware selection etc. Why aren't models distributed in a standard format like onnx, and inference on different platforms solved once per platform?

GeekyBear3y ago

>How come you always have to install some version of pytorch or tensor flow to run these ml models?

The repo is aimed at developers and has two parts. The first adapts the ML model to run on Apple Silicon (CPU, GPU, Neural Engine), and the second allows you to easily add Stable Diffusion functionality to your own app.

If you just want an end user app, those already exist, but now it will be easier to make ones that take advantage of Apple's dedicated ML hardware as well as the CPU and GPU.

>This repository comprises:

    python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python

    StableDiffusion, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by python_coreml_stable_diffusion

https://github.com/apple/ml-stable-diffusion

m3at3y ago

That's done in professional contexts, when you only care about inference onnxruntime does the job well (including for coreml [1]).

I imagine that here apple wants to highlight a more research/interactive use, for example to allow fine tuning SD on a few samples from a particular domain (a popular customization).

[1] https://onnxruntime.ai/docs/execution-providers/CoreML-Execu...

jeroenhd3y ago

Most models seem to be distributed by/for researchers and industry professionals. Stable Diffusion is state of the art technology, for example.

People who can't get the models to work by themselves given the source code aren't the target audience. There are other projects, though, that do distribute quick and easy scripts and tools to run these models.

Apple stepping in to get Stable Diffusion working on their platform is probably an attempt to get people to take their ML hardware more seriously. I read this more like "look, ma, no CUDA!" than "Mac users can easily use SD now". This module seemed to be designed so that the upstream SD code can easily be ported back to macOS without special tricks.

LoganDark3y ago

Seconded, I wish for a way to work with ML models using native code rather than through some Python scripting interface. I believe TensorFlow is there with C++, but it works only with C++ and not through FFI.

c7DJTLrn3y ago

It would increase my interest in experimenting with these models 1000% at the least. I really can't be bothered to spend hours fucking around with pip/pipenv/poetry/virtualenv/anaconda/god knows what other flavour of the month package manager is in use. I just want to clone it and run it, like a Go project. I don't want to download some files from a random website and move them into a special directory in the repo only created after running a script with special flags or some bullshit. I want to clone and run.

ggerganov3y ago

It's one of the reasons I recently ported the Whisper model to plain C/C++. You just clone the repo, run `make [model]` and you are ready to go. No Python, no frameworks, no packages - plain and simple.

https://github.com/ggerganov/whisper.cpp

microtonal3y ago

PyTorch has libtorch as its purely native library. There are also Rust bindings for libtorch:

https://github.com/LaurentMazare/tch-rs

I used this in the past to make a transformer-based syntax annotator. Fully in Rust, no Python required:

https://github.com/tensordot/syntaxdot

0x0083y ago

If you are okay with using nvidia-ecosystem, check out tensor rt.

zitterbewegung3y ago

Apple has their own mlmodel format but they can’t distribute this model as a direct download due to the models EULA. The first task is to translate the model.

EMIRELADERO3y ago

What part of the SD license prohibits that?

1 more reply

0x0083y ago

In the professional context (apart of individual apps distributed by small creators / indiehackers) usually models are run using standardized runtimes in native code (C++ usually), using runtimes TensorRT (for Nvidia Devices), onnxruntime (agnostic), etc.

pmarreck3y ago

DiffusionBee is an app that is completely self-contained and lets you play with this stuff completely trivially, no installs required.

https://diffusionbee.com/

janandonly3y ago

But it's not optimised to work with Apple's CoreML (yet), isn't it?

2 more replies

kuwoze3y ago

If you want it and it doesn't exist, why not simply do it yourself? It's open source no?

tosh3y ago

Atila from Apple on the expected performance:

> For distilled StableDiffusion 2 which requires 1 to 4 iterations instead of 50, the same M2 device should generate an image in <<1 second

https://twitter.com/atiorh/status/1598399408160342039

mrtksn3y ago

With the full 50 iterations it appears to be about 30s on M1.

They have some benchmarks on the github repo: https://github.com/apple/ml-stable-diffusion

For reference, previously I was getting about <3 minutes for 50 iterations on my Macbook Air M1. I haven't yet tried Apple's implementation but it looks like a huge improvement. It might take it from "possible" to "usable".

liuliu3y ago

Yeah, it is just PyTorch MPS backend is not fully baked and have some slowness. You should be able to get close to that number with maple-diffusion (probably 10% slower) or my app: https://drawthings.ai/ (probably around 20% slower, but it supports samplers that takes less steps (50 -> 30)).

washadjeffmad3y ago

For comparison, it's also taking ~3min @ 50 iterations on my 12c Threadripper using OpenVino. It sounds like the improvements bring the M1 performance roughly in line with a GTX 1080.

3 more replies

jerpint3y ago

How do dreamstudio/craiyon/hugging face manage to do seemingly quicker on their interfaces? Are they hosting these models on super beefy and costly GPUs for free?

1 more reply

Terretta3y ago

Haven't tried this yet, but sounds slower than SD itself if you use one of the alt builds that supports mps where it had been cuda.

Mac Studio with M1 Ultra gets 3.3 iters/sec for me.

MacBook Pro M1 Max gets 2.8 iters/sec for me.

1 more reply

cammikebrown3y ago

If you told me this was possible when I bought an M1 Pro less than a year ago, I wouldn’t believe you. This is insane.

ncr1003y ago

Agreed.

And the posted benchmarks for the M2 Macbook Air make me consider 'upgrading' to an Air.

1 more reply

peppertree3y ago

Last nail in the coffin for DALL·E.

mensetmanusman3y ago

Not really, everyone will have their own flavor on how to rapidly train the model.

Dall-e et. al will still be able to bandwagon off of all the free ecosystem being built around the $10M SD1.4 model that is showing what is possible.

E.g. Dall-e could go straight to Hollywood if their model training works better than SD’s. The toolsets will work

1 more reply

m00dy3y ago

yeah, finally we see the real openAI

1 more reply

nomel3y ago

The true metric contains the output quality of the image, not just the speed. DALL-E output is, generally, much better for things that aren't standard looking.

1 more reply

astrange3y ago

I think they can move upmarket just as well as anyone else.

hbn3y ago

SD2 is the one that was neutered, right?

Maybe a dumb question but can the old model still be run?

kyleyeats3y ago

It's less versatile out of the box. Give it a couple months for the community to catch up. Everyone is still figuring out what goes where, and SD 1.x was "everything goes in one spot." It was cool and powerful, but limited.

minimaxir3y ago

You can still do nice things with SD2, it just requires a different approach. https://news.ycombinator.com/item?id=33780543

qclibre223y ago

Also, can you not "upgrade" but still run new models?

1 more reply

minimaxir3y ago

Note that this is extrapolation for the distilled model which isn't released quite yet. (but it will be very exciting when it does!)

chasd003y ago

i'm very ignorant here so forgive me but if it can generate images that fast can it be used to generate a video?

gcanyon3y ago

There are different requirements for generating video -- at a minimum, continuity is tough. There are models for producing video, but (as far as I've seen) they're still a bit wobbly.

vletal3y ago

Yeah, sure. The issue is with temporal consistency. Meta and Google have some successes in that area.

https://mezha.media/en/2022/10/06/google-is-working-on-image...

Give it some time and SD will be able to do the same.

ElFitz3y ago

They already do, with varying levels of performance and success.

See deforum[1] and andreasjansson‘s stable-diffusion-animation[2]

[1]: https://deforum.github.io/

[2]: https://replicate.com/andreasjansson/stable-diffusion-animat...

valgaze3y ago

Video is really a series of frames, the framerate for film/human can get away with 24 frames/second-- so maybe ~40ms/image for real-time at least?

What's cool about the era in which we live is if you look at high-performance graphics for games or simulations, for instance, it may in fact be faster to a the model to "enhance" a low-resolution frame rather than trying to render it fully on the machine.

ex. AMD's FSR vs NVIDIA DLSS

- AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

- NVIDIA DLSS (Deep Learning Super Sampling): jhttps://www.nvidia.com/en-us/geforce/technologies/dlss/

AMD's approach renders the game at a crummy, low-detail resolution then each frame uses "upscales"

Both FSR and DLSS aim to improve frames-per-second in games by rendering them below your monitor’s native resolution, then upscaling them to make up the difference in sharpness. Currently, FSR uses spatial upscaling, meaning it only applies its upscaling algorithm to one frame at a time. Temporal upscalers, like DLSS, can compare multiple frames at once, to reconstruct a more finely-detailed image that both more closely resembles native res and can better handle motion. DLSS specifically uses the machine learning capabilities of GeForce RTX graphics cards to process all that data in (more or less) real time.

Video is really a series of frames, the framerate for film/human could get away with 24 frames/second-- ~40ms/image for real-time.

What's cool about the era in which we live is if you look at high-performance graphics for games or simulations, it may in fact be faster to run the model on each frame to "enhance" a low-resolution frame rather than trying to render it fully on the machine.

ex. AMD's FSR vs NVIDIA DLSS

- AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

- NVIDIA DLSS (Deep Learning Super Sampling): https://www.nvidia.com/en-us/geforce/technologies/dlss/

AMD's approach renders the game at a crummy, low-detail resolution then use "spatial upscaling" to enhance the images one frame at a time.

NVIDIA DLSS uses "temporal upscaling" to pass over multiple frames and uses other capabilities exclusive to Nvidia's cards to stitch together the frames.

This is a different challenge than generating the content from scratch

I don't think this is possible in real-time yet, but someone put a filter trained on the German country side to produce photorealistic Grand Theft Auto driving gameplay:

https://www.youtube.com/watch?v=P1IcaBn3ej0

Notice the mountains in the background go from Southern California brown to lush green

https://www.rockpapershotgun.com/amd-fsr-20-is-a-more-demand....

2 more replies

syspec3y ago

There's also https://draw.nnc.ai/ - which is an iOS / iPad app running Stable Diffusion.

The author has a detailed blogpost outlining how he modified the model to use Metal on iOS devices. https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-mo...

antal3y ago

Yeah, that's what immediately came to mind for me as well. I don't know how similar/different the two solutions are, but it made me smile a bit that what Apple is showing off here has been already done by a single independent developer :)

cloogshicer3y ago

I think it's sad that Apple doesn't even give attribution to any of the authors. If you copy the Bibtex from this site, the Author field is just empty. Their names are also not mentioned anywhere on this site.

This site is purely a marketing effort.

ubercow133y ago

This is about an update to macOS and iOS. Are the 'authors' of macOS updates normally credited? Authors are credited on other papers published on this site that aren't just about OS updates.

MichaelZuo3y ago

Is it standard for Apple to attribute authors in the Bibtex? Or do they usually leave it empty?

rvz3y ago

> I think it's sad that Apple doesn't even give attribution to any of the authors.

Pretty much like Stable Diffusion and the grifters using it in general and they will never credit the artists and images that they stole to generate these images.

astrange3y ago

This is sort of like if you learned English from reading a book and the author said they owned all your English sentences after that.

Of course you can see the original images (https://rom1504.github.io/clip-retrieval/), it was legal to collect them (they used robots.txt for consent just like Google Image Search) and it was legal to do this with them (but not using US legal principles since it's made in Germany).

"Crediting the artist" isn't a legal principle - it's more like some kind of social media standard which is enforced by random amateur artists yelling at you if you don't do it. It's both impossible (there are no original artists for a given output) and wouldn't do anything to help the main social issue (future artists having their jobs taken by AIs).

4 more replies

ClumsyPilot3y ago

Do your point is that Apple and those grifters are equally reputable?

two wrongs don't make a right.

1 more reply

neonate3y ago

https://github.com/apple/ml-stable-diffusion

christiangenco3y ago

Oh gosh that's an intimidating installation process. I'll be much more interested when I can just `brew install` a binary.

artimaeis3y ago

A bit different take is DiffusionBee, if you're curious to try it out in a GUI form.

https://diffusionbee.com

3 more replies

artdigital3y ago

Let's give it a few days and someone will have something semi-automatic ready

gedy3y ago

> Oh gosh that's an intimidating installation process

I'm not seeing any installation instructions on either link - what am I missing?

1 more reply

thepasswordis3y ago

Where are you seeing the installation process?

MuffinFlavored3y ago

I could be wrong but I think part of the issue is this needs some large files for the trained dataset?

1 more reply

wilsongoode3y ago

I’ve been using InvokeAI: https://github.com/invoke-ai/InvokeAI

Great support for M1, basically since the beginning. The install is painless.

Release video for InvokeAI 2.2: https://www.youtube.com/watch?v=hIYBfDtKaus

mark_l_watson3y ago

Great stuff. I like that they give directions for both Swift and Python

This gets you text descriptions to images.

I have seen models that given a picture, then generate similar pictures. I want this because while I have many pictures of my grandmothers, I only have a couple of pictures of my grandfathers and it would be nice to generate a few more.

Core ML is so well done. A year ago I wrote a book on Swift AI and used Core ML in several examples.

astrange3y ago

That’s DreamBooth. There are some services that will do it for you.

mark_l_watson3y ago

Thanks!

1 more reply

zimpenfish3y ago

Man, this takes a ton of room to do the CoreML conversions - ran out of space doing the unet conversion even though I started with 25GB free. Going on a delete spree to get it up to 50GB free before trying again.

password43213y ago

All hail Grand Perspective back in the day, not sure who is carrying the "what's wasting my disk space" torch for free these days.

Edit: still alive! https://grandperspectiv.sourceforge.net/

zimpenfish3y ago

I suspect it was virtual memory - the CoreML conversion progress was at 32Gi at one point and there's only 16GB in this laptop. That would explain why it was consuming 30Gi+ of disk space when the output CoreML models only totalled 2.5Gi.

jtbayly3y ago

Just used this again on 3 different computers, including mine. Works fantastically still.

Found a >100GB accidental “livestream” recording on one computer. Would have taken forever to find what was taking up all the room otherwise.

peddling-brink3y ago

ncdu is the best in my book. TUI, supports deletion of files and folders, and very simple to understand.

GUI apps for this task like GP and the like are more visually complex than they need to be.

2 more replies

pyinstallwoes3y ago

How much space do you have and how much do you try to keep free? I get freaked out if I have less than 400gb free.

zimpenfish3y ago

    /dev/disk3s5  926Gi  857Gi   52Gi    95% 8067489 540828800    1%   /System/Volumes/Data

It normally hovers around 30-35Gi free.

darkteflon3y ago

For the uninitiated, which MacOS GUI app is this library most likely to show up in first/best? DiffusionBee?

pksebben3y ago

automatic111's webui typically gets the most frequent updates. Middling easy to install.

darkteflon3y ago

Great, thank you. Look like there’s already a GH issue: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issu...

tamersalama3y ago

I can't get fine-tune the model ron Apple Silicon due to PyTorch supportability issues. I don't have high-hopes it will be supported.

https://github.com/pytorch/pytorch/issues/77794

https://github.com/pytorch/pytorch/issues/77764

pkage3y ago

How does this compare with using the Hugging Face `diffusers` package with MPS acceleration through PyTorch Nightly? I was under the impression that that used CoreML under the hood as well to convert the models so they ran on the Neural Engine.

liuliu3y ago

It doesn't. MPS largely is on GPU. PyTorch's MPS implementation is incomplete a few weeks ago as well. This is about 3x faster.

wincy3y ago

Is it? I just ran it on my M1 MacBook Air and am getting 3 it/sec, same as I was using Stable Diffusion for M1. Maybe I'm doing something wrong?

1 more reply

joss823y ago

Would it be possible to run 2 SD instances in parallel on a single M1/M2 chip?

One on the GPU and another on the ML core?

noduerme3y ago

Can anyone explain in relatively lay terms how Apple's neural cores differ from a GPU? If they can run stable diffusion so much faster, which normally runs on a GPU, why aren't they used to run shaders for AAA games?

Synaesthesia3y ago

They're designed to run ML specific functions like matrix multiply and stuff. Nvidia has a similar idea in "tensor cores". I think because they're low but operations like 8 or 16 bit which is faster but too low res for GPU work.

behnamoh3y ago

This may sound naive, but what are some use cases of running SD models locally? If the free/cheap options exist (like running SD on powerful servers), then what's the advantage of this new method?

sofaygo3y ago

> There are a number of reasons why on-device deployment of Stable Diffusion in an app is preferable to a server-based approach. First, the privacy of the end user is protected because any data the user provided as input to the model stays on the user's device. Second, after initial download, users don’t require an internet connection to use the model. Finally, locally deploying this model enables developers to reduce or eliminate their server-related costs.

huggingmouth3y ago

Stability! The main reason why I use it locally is because I don't want some random dev unilaterally deciding to change or "sunsetting" features I rely on.

Centralized services small and large are guilty of this and I'm sick of it.

yazaddaruvala3y ago

"Hey Siri, draw me a purple duck" and it all happens without an internet connection!

If you mean monetary usecases: Roughly something like Photoshop/Blender/UnrealEngine with ML plugins that are low latency, private, and $0 server hosting costs.

jwitthuhn3y ago

Even with the slower pytorch implementation my M1 Pro MBP, which tops out at consuming ~100W of power, can generate a decent image in 30 seconds.

I'm not sure exactly what that costs me in terms of power, but it is assuredly less than any of these services charge for a single image generation.

tosh3y ago

Works offline, privacy, independent of SaaS (API stability, longevity, …). I'm sure there are more.

fomine33y ago

Don't want to take a risk to be banned by generating some images like nsfw

gjsman-10003y ago

Powerful servers with GPUs are expensive. Laptops you already own, aren't.

alphatozeta3y ago

fine tuned custom models, models with IP knowledge, models that know what you look like. Better latency etc etc. Obviously some can be served by models hosted locally. You can host a model with Triton and create an API to call it in your native application.

Gigachad3y ago

You can set it to generate 100 images, hit start, come back later and scroll through the results. Can't do that without spending a bunch of money on the hosted services.

mensetmanusman3y ago

Soon you will be able to render home imovies like they were edited by the team that made the dark knight (which costs ~$100k/min if done professionally).

m4633y ago

"A long time ago in a galaxy far, far away"

but seriously, I wonder when you'll be able to paste in a script, and get out a storyboard or a movie

dustedcodes3y ago

What are some good resources to get into working with this and learning the basics around ML to get some fundamental understanding of how this works?

videlov3y ago

I found the blog posts by Jay Alammar to be particularly good. Here are my starting suggestions (in this order) — https://jalammar.github.io/illustrated-word2vec/ https://jalammar.github.io/illustrated-transformer/ https://jalammar.github.io/illustrated-bert/ https://jalammar.github.io/illustrated-stable-diffusion/

siraben3y ago

While running locally on an M1 Pro is nice, recently I've switched over to a Runpod[0] instance running Stable Diffusion instead. The main reasons being high workloads placed on the laptop degrade the battery faster and it takes ~40s to render a single image. On an A5000 it takes mere seconds to do 40 steps. The cost is around $0.2/hr.

[0] https://runpod.io

Joe_Boogz3y ago

can't the battery problem be mitigated if you plug in your Macbook while running Stable Diffusion?

siraben3y ago

The laptop body still heats up and over long periods of time this can degrade the battery, I’ve measured a sharp drop in capacity from the device itself.

personjerry3y ago

Can't wait to see this integrated into automatic1111 so I can use it as a normie

calrizien3y ago

Where is the community for this project?

tomr753y ago

anyone know how to link this to a GUI?

wellthisisgreat3y ago

Macbook Air M1 / 16GB RAM took 3.56 to generate an image, this is pretty wild

zimpenfish3y ago

> 3.56 to generate an image

3.56 seconds?

wellthisisgreat3y ago

ah 3.56 minutes, my mistake

Viluskaran3y ago

8 gb ram

Synaesthesia3y ago

What about it?

j / k navigate · click thread line to collapse

178 comments

sorenjan3y ago

GeekyBear3y ago

>How come you always have to install some version of pytorch or tensor flow to run these ml models?

If you just want an end user app, those already exist, but now it will be easier to make ones that take advantage of Apple's dedicated ML hardware as well as the CPU and GPU.

>This repository comprises:

    python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python

    StableDiffusion, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by python_coreml_stable_diffusion

https://github.com/apple/ml-stable-diffusion

m3at3y ago

That's done in professional contexts, when you only care about inference onnxruntime does the job well (including for coreml [1]).

I imagine that here apple wants to highlight a more research/interactive use, for example to allow fine tuning SD on a few samples from a particular domain (a popular customization).

[1] https://onnxruntime.ai/docs/execution-providers/CoreML-Execu...

jeroenhd3y ago

Most models seem to be distributed by/for researchers and industry professionals. Stable Diffusion is state of the art technology, for example.

LoganDark3y ago

c7DJTLrn3y ago

ggerganov3y ago

https://github.com/ggerganov/whisper.cpp

microtonal3y ago

PyTorch has libtorch as its purely native library. There are also Rust bindings for libtorch:

https://github.com/LaurentMazare/tch-rs

I used this in the past to make a transformer-based syntax annotator. Fully in Rust, no Python required:

https://github.com/tensordot/syntaxdot

0x0083y ago

If you are okay with using nvidia-ecosystem, check out tensor rt.

zitterbewegung3y ago

Apple has their own mlmodel format but they can’t distribute this model as a direct download due to the models EULA. The first task is to translate the model.

EMIRELADERO3y ago

What part of the SD license prohibits that?

1 more reply

0x0083y ago

pmarreck3y ago

DiffusionBee is an app that is completely self-contained and lets you play with this stuff completely trivially, no installs required.

https://diffusionbee.com/

janandonly3y ago

But it's not optimised to work with Apple's CoreML (yet), isn't it?

2 more replies

kuwoze3y ago

If you want it and it doesn't exist, why not simply do it yourself? It's open source no?

tosh3y ago

Atila from Apple on the expected performance:

> For distilled StableDiffusion 2 which requires 1 to 4 iterations instead of 50, the same M2 device should generate an image in <<1 second

https://twitter.com/atiorh/status/1598399408160342039

mrtksn3y ago

With the full 50 iterations it appears to be about 30s on M1.

They have some benchmarks on the github repo: https://github.com/apple/ml-stable-diffusion

liuliu3y ago

washadjeffmad3y ago

For comparison, it's also taking ~3min @ 50 iterations on my 12c Threadripper using OpenVino. It sounds like the improvements bring the M1 performance roughly in line with a GTX 1080.

3 more replies

jerpint3y ago

How do dreamstudio/craiyon/hugging face manage to do seemingly quicker on their interfaces? Are they hosting these models on super beefy and costly GPUs for free?

1 more reply

Terretta3y ago

Haven't tried this yet, but sounds slower than SD itself if you use one of the alt builds that supports mps where it had been cuda.

Mac Studio with M1 Ultra gets 3.3 iters/sec for me.

MacBook Pro M1 Max gets 2.8 iters/sec for me.

1 more reply

cammikebrown3y ago

If you told me this was possible when I bought an M1 Pro less than a year ago, I wouldn’t believe you. This is insane.

ncr1003y ago

Agreed.

And the posted benchmarks for the M2 Macbook Air make me consider 'upgrading' to an Air.

1 more reply

peppertree3y ago

Last nail in the coffin for DALL·E.

mensetmanusman3y ago

Not really, everyone will have their own flavor on how to rapidly train the model.

Dall-e et. al will still be able to bandwagon off of all the free ecosystem being built around the $10M SD1.4 model that is showing what is possible.

E.g. Dall-e could go straight to Hollywood if their model training works better than SD’s. The toolsets will work

1 more reply

m00dy3y ago

yeah, finally we see the real openAI

1 more reply

nomel3y ago

The true metric contains the output quality of the image, not just the speed. DALL-E output is, generally, much better for things that aren't standard looking.

1 more reply

astrange3y ago

I think they can move upmarket just as well as anyone else.

hbn3y ago

SD2 is the one that was neutered, right?

Maybe a dumb question but can the old model still be run?

kyleyeats3y ago

minimaxir3y ago

You can still do nice things with SD2, it just requires a different approach. https://news.ycombinator.com/item?id=33780543

qclibre223y ago

Also, can you not "upgrade" but still run new models?

1 more reply

minimaxir3y ago

Note that this is extrapolation for the distilled model which isn't released quite yet. (but it will be very exciting when it does!)

chasd003y ago

i'm very ignorant here so forgive me but if it can generate images that fast can it be used to generate a video?

gcanyon3y ago

There are different requirements for generating video -- at a minimum, continuity is tough. There are models for producing video, but (as far as I've seen) they're still a bit wobbly.

vletal3y ago

Yeah, sure. The issue is with temporal consistency. Meta and Google have some successes in that area.

https://mezha.media/en/2022/10/06/google-is-working-on-image...

Give it some time and SD will be able to do the same.

ElFitz3y ago

They already do, with varying levels of performance and success.

See deforum[1] and andreasjansson‘s stable-diffusion-animation[2]

[1]: https://deforum.github.io/

[2]: https://replicate.com/andreasjansson/stable-diffusion-animat...

valgaze3y ago

Video is really a series of frames, the framerate for film/human can get away with 24 frames/second-- so maybe ~40ms/image for real-time at least?

ex. AMD's FSR vs NVIDIA DLSS

- AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

- NVIDIA DLSS (Deep Learning Super Sampling): jhttps://www.nvidia.com/en-us/geforce/technologies/dlss/

AMD's approach renders the game at a crummy, low-detail resolution then each frame uses "upscales"

Video is really a series of frames, the framerate for film/human could get away with 24 frames/second-- ~40ms/image for real-time.

ex. AMD's FSR vs NVIDIA DLSS

- AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

- NVIDIA DLSS (Deep Learning Super Sampling): https://www.nvidia.com/en-us/geforce/technologies/dlss/

AMD's approach renders the game at a crummy, low-detail resolution then use "spatial upscaling" to enhance the images one frame at a time.

NVIDIA DLSS uses "temporal upscaling" to pass over multiple frames and uses other capabilities exclusive to Nvidia's cards to stitch together the frames.

This is a different challenge than generating the content from scratch

I don't think this is possible in real-time yet, but someone put a filter trained on the German country side to produce photorealistic Grand Theft Auto driving gameplay:

https://www.youtube.com/watch?v=P1IcaBn3ej0

Notice the mountains in the background go from Southern California brown to lush green

https://www.rockpapershotgun.com/amd-fsr-20-is-a-more-demand....

2 more replies

syspec3y ago

There's also https://draw.nnc.ai/ - which is an iOS / iPad app running Stable Diffusion.

The author has a detailed blogpost outlining how he modified the model to use Metal on iOS devices. https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-mo...

antal3y ago

cloogshicer3y ago

This site is purely a marketing effort.

ubercow133y ago

This is about an update to macOS and iOS. Are the 'authors' of macOS updates normally credited? Authors are credited on other papers published on this site that aren't just about OS updates.

MichaelZuo3y ago

Is it standard for Apple to attribute authors in the Bibtex? Or do they usually leave it empty?

rvz3y ago

> I think it's sad that Apple doesn't even give attribution to any of the authors.

Pretty much like Stable Diffusion and the grifters using it in general and they will never credit the artists and images that they stole to generate these images.

astrange3y ago

This is sort of like if you learned English from reading a book and the author said they owned all your English sentences after that.

4 more replies

ClumsyPilot3y ago

Do your point is that Apple and those grifters are equally reputable?

two wrongs don't make a right.

1 more reply

neonate3y ago

https://github.com/apple/ml-stable-diffusion

christiangenco3y ago

Oh gosh that's an intimidating installation process. I'll be much more interested when I can just `brew install` a binary.

artimaeis3y ago

A bit different take is DiffusionBee, if you're curious to try it out in a GUI form.

https://diffusionbee.com

3 more replies

artdigital3y ago

Let's give it a few days and someone will have something semi-automatic ready

gedy3y ago

> Oh gosh that's an intimidating installation process

I'm not seeing any installation instructions on either link - what am I missing?

1 more reply

thepasswordis3y ago

Where are you seeing the installation process?

MuffinFlavored3y ago

I could be wrong but I think part of the issue is this needs some large files for the trained dataset?

1 more reply

wilsongoode3y ago

I’ve been using InvokeAI: https://github.com/invoke-ai/InvokeAI

Great support for M1, basically since the beginning. The install is painless.

Release video for InvokeAI 2.2: https://www.youtube.com/watch?v=hIYBfDtKaus

mark_l_watson3y ago

Great stuff. I like that they give directions for both Swift and Python

This gets you text descriptions to images.

Core ML is so well done. A year ago I wrote a book on Swift AI and used Core ML in several examples.

astrange3y ago

That’s DreamBooth. There are some services that will do it for you.

mark_l_watson3y ago

Thanks!

1 more reply

zimpenfish3y ago

password43213y ago

All hail Grand Perspective back in the day, not sure who is carrying the "what's wasting my disk space" torch for free these days.

Edit: still alive! https://grandperspectiv.sourceforge.net/

zimpenfish3y ago

jtbayly3y ago

Just used this again on 3 different computers, including mine. Works fantastically still.

Found a >100GB accidental “livestream” recording on one computer. Would have taken forever to find what was taking up all the room otherwise.

peddling-brink3y ago

ncdu is the best in my book. TUI, supports deletion of files and folders, and very simple to understand.

GUI apps for this task like GP and the like are more visually complex than they need to be.

2 more replies

pyinstallwoes3y ago

How much space do you have and how much do you try to keep free? I get freaked out if I have less than 400gb free.

zimpenfish3y ago

    /dev/disk3s5  926Gi  857Gi   52Gi    95% 8067489 540828800    1%   /System/Volumes/Data

It normally hovers around 30-35Gi free.

darkteflon3y ago

For the uninitiated, which MacOS GUI app is this library most likely to show up in first/best? DiffusionBee?

pksebben3y ago

automatic111's webui typically gets the most frequent updates. Middling easy to install.

darkteflon3y ago

Great, thank you. Look like there’s already a GH issue: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issu...

tamersalama3y ago

I can't get fine-tune the model ron Apple Silicon due to PyTorch supportability issues. I don't have high-hopes it will be supported.

https://github.com/pytorch/pytorch/issues/77794

https://github.com/pytorch/pytorch/issues/77764

pkage3y ago

liuliu3y ago

It doesn't. MPS largely is on GPU. PyTorch's MPS implementation is incomplete a few weeks ago as well. This is about 3x faster.

wincy3y ago

Is it? I just ran it on my M1 MacBook Air and am getting 3 it/sec, same as I was using Stable Diffusion for M1. Maybe I'm doing something wrong?

1 more reply

joss823y ago

Would it be possible to run 2 SD instances in parallel on a single M1/M2 chip?

One on the GPU and another on the ML core?

noduerme3y ago

Synaesthesia3y ago

behnamoh3y ago

This may sound naive, but what are some use cases of running SD models locally? If the free/cheap options exist (like running SD on powerful servers), then what's the advantage of this new method?

sofaygo3y ago

huggingmouth3y ago

Stability! The main reason why I use it locally is because I don't want some random dev unilaterally deciding to change or "sunsetting" features I rely on.

Centralized services small and large are guilty of this and I'm sick of it.

yazaddaruvala3y ago

"Hey Siri, draw me a purple duck" and it all happens without an internet connection!

If you mean monetary usecases: Roughly something like Photoshop/Blender/UnrealEngine with ML plugins that are low latency, private, and $0 server hosting costs.

jwitthuhn3y ago

Even with the slower pytorch implementation my M1 Pro MBP, which tops out at consuming ~100W of power, can generate a decent image in 30 seconds.

I'm not sure exactly what that costs me in terms of power, but it is assuredly less than any of these services charge for a single image generation.

tosh3y ago

Works offline, privacy, independent of SaaS (API stability, longevity, …). I'm sure there are more.

fomine33y ago

Don't want to take a risk to be banned by generating some images like nsfw

gjsman-10003y ago

Powerful servers with GPUs are expensive. Laptops you already own, aren't.

alphatozeta3y ago

Gigachad3y ago

You can set it to generate 100 images, hit start, come back later and scroll through the results. Can't do that without spending a bunch of money on the hosted services.

mensetmanusman3y ago

Soon you will be able to render home imovies like they were edited by the team that made the dark knight (which costs ~$100k/min if done professionally).

m4633y ago

"A long time ago in a galaxy far, far away"

but seriously, I wonder when you'll be able to paste in a script, and get out a storyboard or a movie

dustedcodes3y ago

What are some good resources to get into working with this and learning the basics around ML to get some fundamental understanding of how this works?

videlov3y ago

siraben3y ago

[0] https://runpod.io

Joe_Boogz3y ago

can't the battery problem be mitigated if you plug in your Macbook while running Stable Diffusion?

siraben3y ago

The laptop body still heats up and over long periods of time this can degrade the battery, I’ve measured a sharp drop in capacity from the device itself.

personjerry3y ago

Can't wait to see this integrated into automatic1111 so I can use it as a normie

calrizien3y ago

Where is the community for this project?

tomr753y ago

anyone know how to link this to a GUI?

wellthisisgreat3y ago

Macbook Air M1 / 16GB RAM took 3.56 to generate an image, this is pretty wild

zimpenfish3y ago

> 3.56 to generate an image

3.56 seconds?

wellthisisgreat3y ago

ah 3.56 minutes, my mistake

Viluskaran3y ago

8 gb ram

Synaesthesia3y ago

What about it?

j / k navigate · click thread line to collapse