All of this makes it very poorly suited to a collection of heterogeneous compute connected via the internet, which wants a large chunk of mostly independent tasks which have a high compute cost but relatively low bandwidth requirements.
You need enough memory to run the unquantized model for training, then stream the training data through - that part is what is done in parallel, farming out different bits of training data to each machine.
That could be a factor that unites enough people to donate their compute time to build diffusion models. At least if it was easy enough to set up.
Or the large amounts of community efforts (not exactly crowd sourced though) for diffusion fine-tunes and tools! Pony XL, and other uncensored models, for example. I haven't kept up with the rest, because there's just too much.