I know civitai has fine tunes specifically for anime style, realistic, etc. but I don't know which one is "state of the art". /r/stablediffusion usually gets overly hyped about new models and isn't really searchable for what is sota today. This doesn't even get into models that are only accessible via api like flux pro or through an app (midjourney).
LLMs have pretty much the same problem for locally runnable models and api based ones (llama vs qwen 2.5 vs sonnet 3.5 for coding vs other tasks).
Does anyone know of a github repo or an app that is keeping these things up to date? Or is that something that other people would also want to collaborate on?
Flux is the best open weights model.
Ideogram, Recraft, Midjourney, Leonardo are all very capable hosted image generators. DALL-E3 was way ahead of its time and is still very good.
RunwayML Gen3 Alpha, Lumina, Hailuo, Kling, Minimax and others do video well.
Sora is probably the best visual media generator but is not widely available to use. Only people at Meta have used Meta’s Chameleon, which is maybe the most capable visual media generator today.
None are particularly good at particular styles or not.
All the content on CivitAI is reflective of the quality of the foundational models. Flux and SD3 community fine tunes are very capable. CivitAI isn’t representative of the best in the community, the state of the art, or even what people are using this stuff for.
To directly answer your question, though, these are the most useful models right now:
- Stable Diffusion 1.5. Still not great, but better than it used to be; the ecosystem is mature, there's control nets and other things for any possible use-case, and it runs on less horsepower than any other model. Still, you won't get nearly the same quality. Definitely use a fine-tune from civitai; the base model is terrible.
- Stable Diffusion 2: Mostly terrible. Avoid. SDXL is better in every regard.
- SDXL: Kinda the same as 1.5, except better in every regards except system requirements. Still, it will run on almost any modern GPU, and makes megapixel-sized images natively. The base model still isn't great, but there's a lot of finetunes on civitai -- pick one based on desired aesthetic.
- Stable Diffusion 3: Terrible. Avoid.
- Stable Diffusion 3.5: Actually quite good! The system requirements are high, but lower than Flux, and unlike Flux this model isn't distilled. There are two variants, medium and large; medium is tuned for 2-megapixel images, large for 1-megapixel, but large is slightly better in terms of prompt adherence and quality. A common workflow is to use medium for upscaling images that were first created by large. This is also the first model on this list where the base model is perfectly usable, and SD 3.5 understands a lot more styles than anything else you could point at. Which means you always need to specify.
- Flux 1.Dev: A distillation model of Flux 1.Pro, but the latter isn't downloadable. Prompt adherence is better than anything else here, but it basically only understands 'Pixar', 'Photographic' and 'Anime'. If you want a very specific picture, Flux will do better than 3.5, assuming the picture falls in those categories. 3.5 generally makes _prettier_ pictures, though... or more interesting pictures, whichever.
- PonyXL: SDXL architecture, completely new training set. PonyXL is trained on Danbooru-tagged data, and its derivatives are usually the best option if you want anime, assuming that you can't run Flux or SD 3.5. Or if you want something NSFW; 3.5 and Flux are both safety-tuned, though in the case of 3.5 I suspect that will only last another month at most. Some PonyXL fine-tunes give you photorealistic outputs with anime-style tagging. It's the same architecture as SDXL, but you should treat this as a different base model.
Oh, and:
- GenAI Mochi: This is an open-weights video generation model, which you can run in ComfyUI on a 4090 or better. Mostly a novelty, but also quite good actually!
I'd only note that when you say a model is `terrible`, all of this image generation technology is mind-blowing compared to what you could expect to do on consumer grade hardware just a few years ago. We've come a very long way in a very short time.
There are two main tools ( think ollama ) - Automatic1111 ( gradio like UI ) which only works with stable diffusion models... and ComfyUI ( NodeRed like UI ), where comfy ui supports all models but is harder to set up and learn.
For best model in general, I think that Ideogram 2 and Recraft model are the best one to use, recraft.ai allows you to create styles based on the images you upload and that's very useful as the model is not open.
For anime Novel V3 is still the best one after almost a year, Illustrious for open models.
I'm not even talking about the high res feature, just the returned image was better looking, more on-prompt and without weird artefacts
If you're generating images for personal use, there are lots of open weight models that are quite capable.
From the pictures I can kinda assume that this is some transformer/stable diffusion based image generator, but that's because I work in tech. If anyone at Flux reads this: please add a simple explanation above the fold that tells visitors something like "FLUX is an AI based image generator", or something like that. Explain what you do, before you dive into marketing terms and jargon.
When I see a site that is more hype than substance, I assume it's some kind of grift and close the tab.
> Ready to experience the next generation of image creation? Access FLUX1.1 [pro] through our API today.
* "ultra" means 4x resolution at 10s/sample without sacrificing prompt adherence.
* "raw" means the generated images look more like candid photos.
* $0.06 per API call to generate an image.
no fluff, only detailed and specific information. the whole thing is less than 150 words.