Can you expand on this a bit? The way i'm thinking, that is only the case if you need low-latency. And in that case, it seems you just need to charge to cover compute.
We're running Stable Diffusion on an eks cluster and it evens out the load across calls and prevents over-resourcing.
If latency isnt an issue, it can be run on non-gpu machines. If you're looking for someone under $300 or $400/mo, then I agree it may be an issue.
On that note, I havent checked whether there are lambda/fargate style options which provide GPU power, to achieve consumption based pricing tied to usage, but that might be a route. Can anyone speak to this?