FLUX1.1 [Pro] Ultra and Raw Modes (opens in new tab)

(blackforestlabs.ai)

44 pointsbukacdan1y ago17 comments

17 comments

I like they're adding 4MP images, but after so many models in the past 2 years (diffusion, LLM, etc.), I can't keep up with what model(s) are best for which use cases.

I know civitai has fine tunes specifically for anime style, realistic, etc. but I don't know which one is "state of the art". /r/stablediffusion usually gets overly hyped about new models and isn't really searchable for what is sota today. This doesn't even get into models that are only accessible via api like flux pro or through an app (midjourney).

LLMs have pretty much the same problem for locally runnable models and api based ones (llama vs qwen 2.5 vs sonnet 3.5 for coding vs other tasks).

Does anyone know of a github repo or an app that is keeping these things up to date? Or is that something that other people would also want to collaborate on?

doctorpangloss1y ago

There are a lot of good image models you can use.

Flux is the best open weights model.

Ideogram, Recraft, Midjourney, Leonardo are all very capable hosted image generators. DALL-E3 was way ahead of its time and is still very good.

RunwayML Gen3 Alpha, Lumina, Hailuo, Kling, Minimax and others do video well.

Sora is probably the best visual media generator but is not widely available to use. Only people at Meta have used Meta’s Chameleon, which is maybe the most capable visual media generator today.

None are particularly good at particular styles or not.

All the content on CivitAI is reflective of the quality of the foundational models. Flux and SD3 community fine tunes are very capable. CivitAI isn’t representative of the best in the community, the state of the art, or even what people are using this stuff for.

Filligree1y ago

/r/stablediffusion has one user who posts a list of news approximately once per week, which is a good way to keep up to date on new developments, but it's a firehose.

To directly answer your question, though, these are the most useful models right now:

- Stable Diffusion 1.5. Still not great, but better than it used to be; the ecosystem is mature, there's control nets and other things for any possible use-case, and it runs on less horsepower than any other model. Still, you won't get nearly the same quality. Definitely use a fine-tune from civitai; the base model is terrible.

- Stable Diffusion 2: Mostly terrible. Avoid. SDXL is better in every regard.

- SDXL: Kinda the same as 1.5, except better in every regards except system requirements. Still, it will run on almost any modern GPU, and makes megapixel-sized images natively. The base model still isn't great, but there's a lot of finetunes on civitai -- pick one based on desired aesthetic.

- Stable Diffusion 3: Terrible. Avoid.

- Stable Diffusion 3.5: Actually quite good! The system requirements are high, but lower than Flux, and unlike Flux this model isn't distilled. There are two variants, medium and large; medium is tuned for 2-megapixel images, large for 1-megapixel, but large is slightly better in terms of prompt adherence and quality. A common workflow is to use medium for upscaling images that were first created by large. This is also the first model on this list where the base model is perfectly usable, and SD 3.5 understands a lot more styles than anything else you could point at. Which means you always need to specify.

- Flux 1.Dev: A distillation model of Flux 1.Pro, but the latter isn't downloadable. Prompt adherence is better than anything else here, but it basically only understands 'Pixar', 'Photographic' and 'Anime'. If you want a very specific picture, Flux will do better than 3.5, assuming the picture falls in those categories. 3.5 generally makes _prettier_ pictures, though... or more interesting pictures, whichever.

- PonyXL: SDXL architecture, completely new training set. PonyXL is trained on Danbooru-tagged data, and its derivatives are usually the best option if you want anime, assuming that you can't run Flux or SD 3.5. Or if you want something NSFW; 3.5 and Flux are both safety-tuned, though in the case of 3.5 I suspect that will only last another month at most. Some PonyXL fine-tunes give you photorealistic outputs with anime-style tagging. It's the same architecture as SDXL, but you should treat this as a different base model.

Oh, and:

- GenAI Mochi: This is an open-weights video generation model, which you can run in ComfyUI on a 4090 or better. Mostly a novelty, but also quite good actually!

loudmax1y ago

Thanks for this excellent overview of the current available models.

I'd only note that when you say a model is `terrible`, all of this image generation technology is mind-blowing compared to what you could expect to do on consumer grade hardware just a few years ago. We've come a very long way in a very short time.

1 more reply

vanillax1y ago

I've recently been down this path and its a messy place thats really the most user unfriendly experience. But at a high level you have two staples models. Stable Diffusion 3 XL and FLux.

There are two main tools ( think ollama ) - Automatic1111 ( gradio like UI ) which only works with stable diffusion models... and ComfyUI ( NodeRed like UI ), where comfy ui supports all models but is harder to set up and learn.

GaggiX1y ago

I think that for open models Flux models are considered the best ones. The interface that is used for these models is usually ComfyUI.

For best model in general, I think that Ideogram 2 and Recraft model are the best one to use, recraft.ai allows you to create styles based on the images you upload and that's very useful as the model is not open.

For anime Novel V3 is still the best one after almost a year, Illustrious for open models.

anentropic1y ago

Wow, this instantly gave better results to my first couple of queries than any other image generator I've tried

I'm not even talking about the high res feature, just the returned image was better looking, more on-prompt and without weird artefacts

GaggiX1y ago

I'm glad they understand that sometimes people want images that look like images, not overly saturated, high-contrast images that immediately look like AI-generated images.

causal1y ago

6 cents per image :|

loudmax1y ago

I think the 'Pro' part of the name is a good indicator that this is a business-to-business venture, not something aimed at consumers. In that respect, 6 cents per image seems quite reasonable. If you're generating images for a business, paying a human to even visually review image quality at 6c per image is basically sweatshop wage.

If you're generating images for personal use, there are lots of open weight models that are quite capable.

whywhywhywhy1y ago

Not really because you're paying for the roll of the dice not to get the actual desired image. So it's more like the minimum it would cost is 6c with no celling.

1 more reply

LeonM1y ago

The announcement page does not explain what FLUX is, what |pro| means, what Ultra means, or raw for that matter. I click on the homepage, more jargon, still no explanation of what FLUX is. Scrolling down to way below the fold finally there is some hint in a blog post excerpt: "The best of FLUX, offering state-of-the-art performance image generation at blazing speeds".

From the pictures I can kinda assume that this is some transformer/stable diffusion based image generator, but that's because I work in tech. If anyone at Flux reads this: please add a simple explanation above the fold that tells visitors something like "FLUX is an AI based image generator", or something like that. Explain what you do, before you dive into marketing terms and jargon.

bityard1y ago

The current trend in tech marketing is to gush about how awesome your thing is, while being incredibly vague as to what it even does. My guess is that they somehow believe that keeping people wondering will prompt them to enter their email address somewhere just to find out.

When I see a site that is more hype than substance, I assume it's some kind of grift and close the tab.

woodrowbarlow1y ago

this didn't seem vague to me. i suppose it might've helped if they'd move the last sentence to the very beginning.

> Ready to experience the next generation of image creation? Access FLUX1.1 [pro] through our API today.

* "ultra" means 4x resolution at 10s/sample without sacrificing prompt adherence.

* "raw" means the generated images look more like candid photos.

* $0.06 per API call to generate an image.

no fluff, only detailed and specific information. the whole thing is less than 150 words.

j / k navigate · click thread line to collapse

17 comments

Flux1591y ago

I like they're adding 4MP images, but after so many models in the past 2 years (diffusion, LLM, etc.), I can't keep up with what model(s) are best for which use cases.

LLMs have pretty much the same problem for locally runnable models and api based ones (llama vs qwen 2.5 vs sonnet 3.5 for coding vs other tasks).

Does anyone know of a github repo or an app that is keeping these things up to date? Or is that something that other people would also want to collaborate on?

doctorpangloss1y ago

There are a lot of good image models you can use.

Flux is the best open weights model.

Ideogram, Recraft, Midjourney, Leonardo are all very capable hosted image generators. DALL-E3 was way ahead of its time and is still very good.

RunwayML Gen3 Alpha, Lumina, Hailuo, Kling, Minimax and others do video well.

Sora is probably the best visual media generator but is not widely available to use. Only people at Meta have used Meta’s Chameleon, which is maybe the most capable visual media generator today.

None are particularly good at particular styles or not.

Filligree1y ago

/r/stablediffusion has one user who posts a list of news approximately once per week, which is a good way to keep up to date on new developments, but it's a firehose.

To directly answer your question, though, these are the most useful models right now:

- Stable Diffusion 2: Mostly terrible. Avoid. SDXL is better in every regard.

- Stable Diffusion 3: Terrible. Avoid.

Oh, and:

- GenAI Mochi: This is an open-weights video generation model, which you can run in ComfyUI on a 4090 or better. Mostly a novelty, but also quite good actually!

loudmax1y ago

Thanks for this excellent overview of the current available models.

1 more reply

vanillax1y ago

I've recently been down this path and its a messy place thats really the most user unfriendly experience. But at a high level you have two staples models. Stable Diffusion 3 XL and FLux.

GaggiX1y ago

I think that for open models Flux models are considered the best ones. The interface that is used for these models is usually ComfyUI.

For anime Novel V3 is still the best one after almost a year, Illustrious for open models.

anentropic1y ago

Wow, this instantly gave better results to my first couple of queries than any other image generator I've tried

I'm not even talking about the high res feature, just the returned image was better looking, more on-prompt and without weird artefacts

GaggiX1y ago

I'm glad they understand that sometimes people want images that look like images, not overly saturated, high-contrast images that immediately look like AI-generated images.

causal1y ago

6 cents per image :|

loudmax1y ago

If you're generating images for personal use, there are lots of open weight models that are quite capable.

whywhywhywhy1y ago

Not really because you're paying for the roll of the dice not to get the actual desired image. So it's more like the minimum it would cost is 6c with no celling.

1 more reply

LeonM1y ago

bityard1y ago

When I see a site that is more hype than substance, I assume it's some kind of grift and close the tab.

woodrowbarlow1y ago

this didn't seem vague to me. i suppose it might've helped if they'd move the last sentence to the very beginning.

> Ready to experience the next generation of image creation? Access FLUX1.1 [pro] through our API today.

* "ultra" means 4x resolution at 10s/sample without sacrificing prompt adherence.

* "raw" means the generated images look more like candid photos.

* $0.06 per API call to generate an image.

no fluff, only detailed and specific information. the whole thing is less than 150 words.

j / k navigate · click thread line to collapse