- Published on
AI Image Generation 2026 — Flux, Imagen 4, Midjourney v7, Ideogram, Recraft, Firefly, DALL-E, Stable Diffusion: An Honest Comparison
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Prologue — Two Earthquakes
August 2022. Stable Diffusion 1.4 was released. Before that, image generation AI lived inside the closed betas of OpenAI's DALL-E 2 and Midjourney v3. The moment SD shipped with open weights, the whole category was reshaped. Anyone with a 4090 could generate unlimited images locally. ComfyUI, Automatic1111, Fooocus, and Forge exploded. LoRA, ControlNet, and IP-Adapter extensions arrived in waves. 2023 was "the year of the SD ecosystem."
Early 2024. A name appeared called Black Forest Labs. It was the original SD core team — Robin Rombach, Andreas Blattmann, Patrick Esser, Dominik Lorenz — who had left Stability AI to start a new company. In August they announced Flux.1, a three-tier system: open-weight Schnell (Apache 2.0), non-commercial Dev, and API-only Pro. On day one Flux beat SD-XL, and within a year it became the new default for open-weight image models. That was the first earthquake.
Spring 2025. The second earthquake hit. Midjourney v7 launched in April and reset the bar for consumer aesthetic again. In June Google's Imagen 4 went GA. Adobe Firefly Image 4 followed the next quarter. In August, Black Forest Labs released Flux.1 Kontext, opening a separate track called "image editing." In November, OpenAI replaced DALL-E 3 with gpt-image-1 as the default model inside ChatGPT. Ideogram pushed its typography lead further with v3.
As of May 2026, the landscape looks like this.
- The open-weight throne. Flux replaced SD-XL/3.5. The most-loaded base model in ComfyUI and Forge is now a Flux derivative. Stability AI has lagged a beat since SD 3.5 Large.
- The consumer aesthetic peak. Midjourney v7, followed by Imagen 4 Ultra. If a designer must pick one image to ship, the answer is still one of these two.
- The solo typography champion. Ideogram v3. If a poster needs legible text, there is almost no other choice.
- The designer tools. Recraft built a category as "the AI that also exports vectors." Firefly hardened the position of "safe-to-use image inside Adobe workflows."
- The developer backends. OpenAI gpt-image-1, Google Imagen 4, Flux Pro 1.1 — the three most-called image APIs in production.
- Lawsuits and licensing. The UK Stability AI vs Getty ruling in November 2025 separated the question: training is lawful, output-level trademark similarity is a different problem. That ruling firmed up the marketing position of Firefly, Imagen, and gpt-image-1 as "license-clean."
This post tries to map that landscape honestly. Which model wins which slice, what local vs cloud actually means in 2026, whether ComfyUI is really dead (it is not), and where the lawsuits land. The same shape as the music post — five axes, one decision tree, an anti-pattern table at the end.
One-line take: in 2026 image generation there is no "single best model." The five axes — typography, consistency, editing, licensing, aesthetic — have split into different tools. Knowing the tool turns an hour-long job into a ten-minute one.
1 · The Arrival of the Flux Era — A New Baseline for Open Weights
1.1 Who Is Black Forest Labs
March 2024. The core researchers at Stability AI — Robin Rombach, Andreas Blattmann, Patrick Esser, Dominik Lorenz — left to start Black Forest Labs. Headquartered in Freiburg, Germany. Seed round in August 2024, around 31 million USD, led by Andreessen Horowitz with General Catalyst, Y Combinator, and MätchVC joining.
These are the original authors of SD 1.x, 2.x, and SD-XL. The people who built the open-weight image generation category started a new company, and their first product was Flux.1.
1.2 The Three Tiers of Flux.1
Flux.1 shipped as three variants of the same architecture trained on the same data.
- Flux.1 Schnell. Apache 2.0 license. 1-4 step inference. The lightest, the most permissive. Commercial use allowed. Runs on a 4090 with 6-8 GB VRAM.
- Flux.1 Dev. Black Forest Labs Non-Commercial License. Weights are public but commercial use is forbidden. For research, learning, and personal projects. 50-step guided inference.
- Flux.1 Pro. Closed. API-only. Highest quality. Hosted on fal.ai, Replicate, Together AI.
The structure is clever. Open the weights to build the ecosystem; recover commercial value through Pro and licensing. In 2025, Flux.1.1 Pro and Flux.1.1 Pro Ultra (up to 4 megapixels) extended the Pro lane.
1.3 Why It Beat SD-XL
Three technical points explain the jump.
- 12 billion parameters. Roughly 4.6x SD-XL (2.6B) and 50 percent larger than SD 3.5 Large (8B).
- Rectified Flow. A variant of diffusion. Standard diffusion learns a curved path from noise to image; Rectified Flow tries to learn a straight one. Fewer sampling steps produce higher quality.
- MMDiT architecture. The multimodal diffusion transformer introduced in Stable Diffusion 3. Text and image flow through the same transformer blocks together. Prompt adherence jumped sharply over SD-XL.
In human-evaluation benchmarks at launch (August 2024), Flux Pro beat Midjourney v6, DALL-E 3, and SD 3 across the board. Hands, in-image text, and anatomy — the classic weak spots — improved the most.
1.4 Flux Kontext — The Editing Answer
May 2025. Black Forest Labs released Flux.1 Kontext. It is an "image + text -> edited image" model. Different from text-to-image.
What makes Kontext different.
- Multi-input. One or more reference images plus a text instruction. Things like "preserve this face, change the outfit to a black suit" or "blend these two inputs into a unified tone" work natively.
- Local edits. "Change this part, keep the rest." No inpainting mask required. The instruction is in text.
- Multi-turn. Stack edits on the same image — outfit, then background, then lighting, then hair color.
- Three variants. Kontext Pro (API), Kontext Max (highest quality), Kontext Dev (open weights, non-commercial).
Before Kontext, image editing was a stack of ControlNet, IP-Adapter, inpainting masks, and LoRA. Now a single text instruction does most of it.
1.5 Flux Tools — Auxiliary Models
November 2024. Black Forest Labs released Flux Tools.
- Flux Fill. Inpainting and outpainting. Mask plus text to fill a region.
- Flux Canny. Replacement for Canny-edge ControlNet.
- Flux Depth. Depth-map guidance.
- Flux Redux. Image variation. Generate similar-mood variants of a single input.
These absorbed most of the ControlNet and IP-Adapter ecosystem from the SD 1.5 and XL era.
1.6 The Local Execution Reality
Running Flux Dev on a 4090.
- Full precision (FP16). 24 GB VRAM. One image in around 20 seconds.
- FP8 quantization. 12-16 GB. One image in around 15 seconds. Quality drop is negligible.
- NF4 quantization. 6-8 GB. One image in around 25 seconds (slower). Slight quality dip, but a 4060 8 GB can run it.
- Schnell. Four steps is enough. Under 5 seconds per image.
ComfyUI, Forge, SwarmUI, and InvokeAI all support Flux natively. By 2025 "local image generation = Flux" became the default.
2 · The Consumer Aesthetic Peak — Midjourney and Imagen
2.1 Midjourney v7
Midjourney is the aesthetic reference for the category. Other models chase the look Midjourney sets. v7 went to alpha in April 2025 and GA in June.
Key changes in v7.
- Personalization. On first use, the model asks you to rate around 200 images. It learns your taste, and the same prompt produces different output for different users.
- Draft Mode. A fast, cheap draft. Roughly one-tenth the token cost, four images in under thirty seconds. Pick the favorite and upscale to full mode.
- Style Reference v2. A
--srefcode or a reference image holds a consistent style. Much more stable than v6. - Character Reference (
--cref). Keeps the same character across many panels. The core of comic and storybook workflows.
In a single line — Midjourney wins on "the aesthetic finish of a single scene." For a poster, an illustration, a single moodboard frame, the output is the least-corrected by a working designer.
Weaknesses.
- Text rendering. v7 is still weak at letters. Below Ideogram.
- Photo-grade realism. Concedes to Imagen 4 Ultra for photographic work.
- No API. Midjourney has no official API. Discord bot plus unofficial wrappers. Unsuitable for production automation.
- Commercial license. Allowed at Pro and above. But the training-data licensing is not advertised as clean.
2.2 Google Imagen 4
Imagen 4 went GA in June 2025. Two big steps up from Imagen 3 (December 2024).
- Imagen 4 Standard. Fast, general.
- Imagen 4 Ultra. The peak of photo-grade realism. Portraits, landscapes, product shots — head-to-head with Midjourney v7 Photo.
- Imagen 4 Fast. Cost-optimized variant.
What stands out.
- Text rendering improved sharply. Imagen 3's weak spot is now usable in v4. Not as accurate as Ideogram, but better than Midjourney.
- Mandatory SynthID watermarking. Every output carries an invisible watermark. Lines up with the broader standardization push for AI provenance.
- Vertex AI integration. The easiest path for enterprise adoption. Inherits SOC 2 and HIPAA compliance.
- Commercial safety. Google offers explicit IP indemnification on outputs, same lane as Firefly.
Weaknesses.
- Creative aesthetic. Strong at photo, average at "the personality of an illustration." Midjourney and Flux still win there.
- Content filters. Enterprise safety thresholds are strict; some legitimate prompts get rejected.
2.3 OpenAI gpt-image-1
In April 2025 OpenAI introduced a new default image model called gpt-image-1 inside ChatGPT. The earlier default, DALL-E 3, moved to backup.
Where gpt-image-1 sits.
- Conversational editing. "Make this -> change the color -> add this caption" feels natural across turns. Same direction as Flux Kontext but with a chat interface.
- Text rendering. A big step up from DALL-E 3, roughly at Imagen 4's level. Still under Ideogram.
- Realism. A step behind Imagen 4 Ultra. Average aesthetic compared to Midjourney v7.
- API pricing. Output-token based. Roughly 0.02 to 0.19 USD per image depending on quality.
Because it is the model most invoked inside ChatGPT, by raw call volume it may be the category leader in 2026. The accurate framing is "not the highest quality, but the lowest-friction interface."
2.4 Comparison — Consumer and API Models
| Tool | Aesthetic | Realism | Text Rendering | Editing | License | API |
|---|---|---|---|---|---|---|
| Midjourney v7 | Top | Very high | Weak | --cref consistency | Pro and above | None (unofficial only) |
| Imagen 4 Ultra | High | Top | Decent | Separate (Imagen Edit) | Indemnified | Vertex AI |
| gpt-image-1 | Decent | High | Decent | Strong (chat) | Standard OpenAI | OpenAI API |
| Flux Pro 1.1 | High | Very high | Decent | Kontext (separate) | Commercial (Pro) | fal/Replicate |
| DALL-E 3 | Decent | High | Decent | Weak | Standard OpenAI | OpenAI API (legacy) |
3 · Typography and Designer Tools — Ideogram, Recraft, Firefly
3.1 Ideogram v3 — When Text Has to Appear
Ideogram is a Toronto startup founded in August 2023. CEO is Mohammad Norouzi, formerly at Google Brain and one of the original Imagen authors. From day one the company picked "text inside images" as its core differentiator.
- Ideogram 1.0 (Feb 2024). Most accurate text rendering in the category.
- Ideogram 2.0 (Aug 2024). Strengthened realism and style control.
- Ideogram 3.0 (Mar 2025). Pushed typography and aesthetic up together. Effectively solo leader for posters, business cards, ad copy, and book covers.
Core features.
- Magic Prompt. Auto-expands prompts to improve results.
- Canvas. Inpainting, outpainting, and mask editing in a single workflow.
- Style Reference. Reference image for consistent style.
- Character Consistency. Added in v3. Hold a character across multiple frames.
Typography accuracy is not "the letters are readable" but "the designer can ship the output as-is." That gap is decisive against Midjourney, Flux, and Imagen.
Weaknesses: realism slightly behind Imagen 4 Ultra, and character consistency slightly behind Midjourney's --cref.
3.2 Recraft — AI for Designers
Recraft picked a single position — "AI for graphic designers." Recraft V3 launched in October 2024 and briefly topped the Hugging Face TTI leaderboard.
Differentiators.
- Vector output. Direct SVG generation. Logos, icons, illustrations as vectors. Nearly unique among AI tools.
- Brand library. Save palette, fonts, and style; apply consistently across outputs.
- Integrated image plus text. Posters treat type as a design element. More design-friendly typesetting than Ideogram.
- Image editing. Inpainting, outpainting, object removal, background change in a unified UI.
- 3D mockups. Auto-mapping images to 3D objects (mugs, books, phones).
After V3, V3 Plus shipped in 2025 with further realism gains. V3.5 is in beta as of May 2026. For a designer, Recraft is the single tool that handles "generate -> edit -> deliver in another format."
Pricing: 50 free credits per day, Basic 12 USD per month, Pro 33 USD per month.
3.3 Adobe Firefly — The Licensing Clarity Story
Adobe Firefly's value prop is one sentence: "trained only on Adobe Stock and public domain." Other models live in a licensing grey zone; Firefly does not.
Firefly Image 4 launched in May 2025 and Image 4 Ultra arrived that autumn.
- Firefly Image 4. General-purpose. Balanced across realism, illustration, and text.
- Firefly Image 4 Ultra. High resolution and detail. Ads, publishing, product design.
Strengths inside the Adobe ecosystem.
- Photoshop Generative Fill. Firefly powers inpainting and outpainting inside Photoshop. Zero-friction adoption in the designer workflow.
- Illustrator Generative Recolor. Auto color variants for vectors.
- Premiere Pro Generative Extend. Extends video clips with text (a separate Firefly Video model).
- Adobe Express. Integrated UI for non-experts.
- Indemnification. Enterprise customers get IP indemnification on outputs.
Weaknesses.
- Standalone aesthetic. Less of a "wow" than Midjourney v7.
- Content filters. Strong safety thresholds reject many faces, public figures, and certain commercial concepts.
- Price. Effectively free for existing Creative Cloud subscribers, expensive standalone.
3.4 Comparison — Typography and Designer Tools
| Tool | Text Accuracy | Vector | Designer Workflow | License Clarity | Price |
|---|---|---|---|---|---|
| Ideogram v3 | Top | No | Canvas integrated | Decent | Free to 20 USD/mo |
| Recraft V3 | Very high | Yes (SVG) | Brand library | Decent | Free to 33 USD/mo |
| Firefly Image 4 | High | No | Adobe integrated | Top | Included with CC |
4 · Open Source and Local — Stable Diffusion 3.5, SD-XL, HiDream, Janus-Pro
4.1 Where Stable Diffusion Stands
Stability AI, who created the category in 2022, had a rough 2024-2025.
- SD 3 Medium (June 2024). Hit immediate backlash over anatomy issues. License changes (Creator vs Enterprise split) were also controversial.
- SD 3.5 Large/Medium/Large Turbo (October 2024). Addressed SD 3's weak spots. 8B/2.5B/8B parameters. Stability AI Community License (free under 1 million USD annual revenue).
- SD-XL 1.0 (July 2023). Still the most-used base model by sheer volume. The LoRA ecosystem grew up around SD-XL.
In May 2026, SD 3.5 is "still used but not the category leader." Flux is clearly ahead. Stability AI, after a 2024 CEO turnover and financial struggles, has shifted weight toward Stable Audio, Stable Video, and 3D.
4.2 SD-XL — The Power of the Legacy
SD-XL persists for one reason. The LoRA, ControlNet, IP-Adapter, and Textual Inversion ecosystem is enormous. Tens of thousands of SD-XL LoRAs live on Civitai. For specific art styles, characters, and aesthetics, SD-XL still offers the deepest catalog.
When to stay on SD-XL.
- A required LoRA exists only for SD-XL. Anime styles, specific illustrator looks, recurring characters.
- Precise control via ControlNet. Pose, depth, edges.
- Hardware is limited. SD-XL runs comfortably on 8 GB VRAM.
- An existing ComfyUI graph is in production. Don't break what works.
When to move to Flux.
- Starting a new base workflow.
- Prompt adherence matters. Flux is far ahead.
- Commercial licensing must be unambiguous. Flux Schnell.
4.3 HiDream — The 2025 Newcomer
HiDream-I1, released April 2025, is a 17B open-weight model under the MIT license. In some academic benchmarks it edges out Flux Dev.
- Hardware. 24 GB VRAM recommended. NF4 quantization drops to 12 GB.
- Quality. Balanced across realism, text, and consistency. Roughly level with Flux Dev.
- License. MIT — fully commercial. The decisive advantage over Flux Dev (non-commercial).
ComfyUI supports it natively. As of May 2026, HiDream is the "real free alternative to Flux Dev."
4.4 Janus-Pro / Krea — Adjacent Directions
Janus-Pro (DeepSeek, January 2025). A multimodal LLM that also generates images. Text and images flow through the same model. 7B parameters, MIT license. Quality is below Flux, but the paradigm of "LLM as image generator" is worth tracking.
Krea AI. Not a model vendor but a workflow platform. Aggregates multiple models behind one interface. Its real-time canvas (the AI fills in as you sketch) is the differentiator. Krea launched its own model Krea-1 in 2025.
4.5 Comparison — Open Source and Local
| Model | Params | License | Min VRAM | Strength |
|---|---|---|---|---|
| Flux.1 Schnell | 12B | Apache 2.0 | 6 GB (NF4) | Fast, free, commercial OK |
| Flux.1 Dev | 12B | BFL Non-Commercial | 6-24 GB | Top open-weight (non-commercial) |
| HiDream-I1 | 17B | MIT | 12-24 GB | Commercial Flux alternative |
| SD 3.5 Large | 8B | Stability Community | 8-16 GB | Mature catalog |
| SD-XL 1.0 | 2.6B | OpenRAIL++ | 6-8 GB | LoRA ecosystem |
| HiDream Dev | 17B | MIT | 12 GB | Distilled HiDream |
5 · Tools vs Models — Where ComfyUI, Forge, and A1111 Are
5.1 Are the UIs Dead? No.
A common myth in 2025 was that Automatic1111, ComfyUI, Forge, Fooocus, and InvokeAI faded as the category moved to hosted models. Wrong. The category shape just changed.
ComfyUI got bigger in 2025-2026. Node-based, so new models get new nodes quickly. Flux, HiDream, and every video model (Wan, HunyuanVideo, LTX-Video) lands in ComfyUI first. It is now the de-facto standard for AI image and video workflow automation.
Forge UI (Forge / Forge Classic). A fork of Automatic1111. Memory optimization runs Flux on 8 GB GPUs. The UI is friendlier than ComfyUI, so it became the entry point for beginners.
Automatic1111 (A1111). Update cadence slowed in 2025. Flux support arrived later than Forge, and ControlNet lagged. The legacy SD-XL user base stays, but new entrants moved to Forge/ComfyUI.
InvokeAI. Pivoted to commercial SaaS. Targets enterprise workflow solutions.
SwarmUI. Uses ComfyUI as a backend but presents a friendlier UI. Recommended for users who don't want to draw node graphs.
Fooocus. Midjourney-style simple UI. Best onboarding for non-technical users.
5.2 A Sample ComfyUI Graph
Flux Dev plus Flux Kontext plus LoRA plus upscale, in a single graph.
[LoadCheckpoint Flux Dev]
|
+-[Text Encoder] <- [Prompt: "cyberpunk alley, neon"]
| |
| [KSampler] <- [Empty Latent 1024x1024]
| |
| [VAE Decode]
| |
+-[LoadKontext] <- [Reference image]
| |
| [Kontext Edit] <- [Instruction: "make it sunset"]
| |
+-[LoadLora char-v1] <- [Strength 0.8]
| |
+-[Upscale 4x ESRGAN]
| |
+-[Save Image]
Build the graph once and you can rerun it with different prompts to generate hundreds of variants. The automation matches direct API use, with the bonus of seeing every intermediate step visually.
5.3 New Entrant UIs
- Krea. Real-time canvas. The AI fills in as you sketch. Popular with designers.
- Magnific. Upscaling and detail enhancement specialist. Post-processes other models' outputs.
- Leonardo.AI. SaaS UI plus in-house model plus integrated workflow.
- OpenArt. Hosts ComfyUI workflows on the web. Share node graphs without managing your own server.
5.4 Where to Put Your Workflow
One-line picks.
- One quick shot: Midjourney v7, Ideogram, Imagen 4 (web UI).
- Automation and batch: API (fal.ai, Replicate, OpenAI, Vertex AI) or local ComfyUI.
- Fine control (LoRA, ControlNet): Local ComfyUI or Forge.
- Designer workflow: Recraft, Firefly, Krea.
- Engineering integration: API.
6 · Lawsuits and Licensing — Honestly
6.1 Stability AI vs Getty Images
The most-cited case. Getty Images sued Stability AI in both the UK and US (2023).
UK ruling, November 2025, High Court of Justice.
- Training itself is not UK copyright infringement. The court found that original images are not preserved inside model weights.
- Trademark infringement is separate. Where outputs partially reproduced the Getty watermark, the court did find trademark infringement.
- Summary. Training is lawful, output-level trademark similarity is not.
The US case is still pending in May 2026. US law differs and the outcome may differ.
6.2 Other Active Cases
- Andersen v. Stability AI. A class action by an artist group against Stability, Midjourney, and DeviantArt. Some claims dismissed; copyright claims remain alive.
- NYT v. OpenAI. A text-training case, but its precedent will affect image-training case law.
- Disney licensing. Reports in 2025 indicated Disney was negotiating direct licensing deals with several AI companies. Direct major-IP licensing may become standard.
6.3 What Users Should Do
Safer commercial options (May 2026).
- Adobe Firefly. Adobe Stock plus public domain only. Indemnification. The safest.
- Google Imagen 4. Indemnification. License-clean training data marketed explicitly.
- OpenAI gpt-image-1. Standard OpenAI terms. Indemnification only on the Enterprise plan.
- Flux Schnell, self-hosted. Apache 2.0 weights. Outputs belong to the user.
- HiDream-I1. MIT weights. Commercial use OK.
Grey zone.
- Midjourney. Commercial use of outputs allowed at Pro and above, but explicit training-data licensing is not advertised.
- SD-XL plus community LoRA. Many LoRAs have unclear training-data provenance, especially "specific artist style" LoRAs.
- Recraft. License policy is stated, but training-data sources are only partially disclosed.
Risky behavior.
- Famous artist or illustrator names in the prompt. "In the style of [Artist]" output, used commercially, is clearly risky.
- Direct imitation of trademarked characters and IP. Disney characters, game characters, brand logos.
- Selling NFTs or merchandise without explicit license confirmation.
6.4 However the Lawsuits End
Three scenarios.
Scenario A — "training is transformative fair use" prevails. AI training is legalized. Output-level trademark and similarity issues remain separate. The "explicit licensing" marketing edge of Firefly and Imagen narrows.
Scenario B — "training requires licensing" prevails. Stable Diffusion and Midjourney face licensing settlements or forced retraining. Costs jump, subscriptions rise, and Firefly and Imagen pull further ahead.
Scenario C — Settlement and licensing standardize. Like the Disney-AI rumored deals: major-IP licensing becomes the norm, academic and open-source models live in a separate track. The most likely outcome.
7 · The Decision Framework — How to Pick
7.1 Recommendations by Use Case
| Situation | First Choice | Second Choice | Note |
|---|---|---|---|
| Single concept illustration | Midjourney v7 | Flux Pro 1.1 | Aesthetic first |
| Photo-grade portrait or product | Imagen 4 Ultra | Flux Pro | Realism |
| Poster or ad with text | Ideogram v3 | Recraft V3 | Typography accuracy |
| Logo or icon (vector) | Recraft V3 | Adobe Illustrator | Vector output |
| Brand consistency | Firefly Image 4 | Midjourney --sref | Indemnification + workflow |
| Character consistency (comics) | Midjourney --cref | Flux Kontext | Multi-panel |
| Image editing | Flux Kontext | gpt-image-1 | Text-driven |
| Inpainting / outpainting | Photoshop + Firefly | Flux Fill | Workflow |
| API automation | fal.ai + Flux Pro | Vertex AI Imagen 4 | SLA |
| Local / private | Flux Dev (non-commercial) | HiDream-I1 (commercial) | Self-host |
| Free start | Flux Schnell + Forge | SD-XL + Civitai LoRA | 4 GB+ GPU |
| Commercial safety first | Firefly | Imagen 4 | Indemnification |
| Academic / research | SD 3.5 + paper repro | Flux Dev | Verifiability |
7.2 Decision Tree
Start
|
+- Must the image contain text?
| +- Yes -> Ideogram v3 or Recraft V3
| +- No -> next
|
+- Photo-grade realism required?
| +- Yes -> Imagen 4 Ultra or Flux Pro 1.1
| +- No -> next
|
+- Designer workflow (brand, vector)?
| +- Yes -> Recraft or Adobe Firefly
| +- No -> next
|
+- Character or scene consistency required?
| +- Yes -> Midjourney `--cref` or Flux Kontext
| +- No -> next
|
+- License cleanliness top priority?
| +- Yes -> Firefly or Imagen 4 (indemnified)
| +- No -> next
|
+- Local / private execution required?
| +- Yes -> Flux Dev or Schnell, or HiDream-I1
| +- No -> next
|
+- API automation / batch needed?
+- Yes -> fal.ai Flux Pro or OpenAI gpt-image-1
+- No -> Midjourney v7 (single-scene aesthetic)
7.3 Budget Guide
| Budget | Recommendation |
|---|---|
0 USD/month | Flux Schnell locally with Forge. 4 GB+ GPU. Unlimited. |
10 USD/month | Midjourney Basic or Ideogram Basic. One tool. |
30 USD/month | Midjourney Standard + Ideogram + ChatGPT Plus. Aesthetic + typography + editing. |
60 USD/month | + Recraft Pro or Adobe CC. Full designer stack. |
200+ USD/month | API usage (fal.ai Flux Pro + Imagen 4 + gpt-image-1) on top. Production automation. |
Epilogue — Checklist, Anti-Patterns, Next Post
The shock of SD 1.4 in 2022, Flux 1's overtake in 2024, the Midjourney v7 / Imagen 4 consumer jump in 2025, and the Flux Kontext / gpt-image-1 editing paradigm shift in 2026 — the category has never sat still. Music and video shifted the same way. The difference is that images stabilized first. Users now ask "which tool for which job" rather than "which model is best." There is no one-line answer, but the five axes are clear — aesthetic (Midjourney), realism (Imagen), typography (Ideogram), designer workflow (Recraft / Firefly), open weights (Flux / HiDream).
Tool Selection Checklist
- Does the image contain text? If yes, lead with Ideogram or Recraft.
- Commercial use? If yes, Firefly / Imagen indemnification or self-hosted Flux Schnell.
- Single shot or a series? If a series, character consistency (
--cref, Flux Kontext) is mandatory. - Editing required? Pick one of Flux Kontext, gpt-image-1, or Photoshop Generative Fill.
- Local feasible? 16 GB+ GPU runs Flux Dev. 24 GB runs HiDream.
- Automation required? Use APIs. Midjourney is unsuitable for automation.
- Vector required? Recraft is nearly alone here.
- Realism or illustration? Realism -> Imagen 4 Ultra. Illustration -> Midjourney v7.
- Multi-turn editing? gpt-image-1 (chat) or Flux Kontext.
- License safety top priority? Firefly first, Imagen second.
Anti-Patterns
| Anti-Pattern | Why It Hurts | Instead |
|---|---|---|
| Shipping the first generation | Average quality is low | Generate 4-8, curate |
| Famous artist names in prompts | Licensing grey zone, lawsuit risk | Abstract descriptions ("late-80s synthwave poster") |
| Automating Midjourney | No official API; unofficial wrappers violate ToS | fal.ai Flux Pro, gpt-image-1, Imagen 4 |
| Staying on SD-XL, ignoring Flux | Prompt-adherence gap compounds | Start with Flux Schnell; keep SD-XL only when a LoRA is required |
| Avoiding ComfyUI as "too complex" | Automation gap compounds | Start with Fooocus / Forge, graduate to ComfyUI |
| Shipping Flux Dev commercially | Violates the Non-Commercial license | Use Flux Schnell, Flux Pro, or HiDream |
| Posters with text via Midjourney | Letters break | Ideogram v3 or Recraft |
| Selling NFTs or merch without license labels | IP risk | Confirm explicit commercial rights on outputs |
| Expecting 4K+ from a single generation | Model outputs are usually 1-2 MP | Upscale with Magnific / Topaz |
| Free-tier for client work | License violations, watermarks | At minimum Pro |
| Single-model dependence | Aesthetic / typography / editing gaps accumulate | Combine 2-3 models (aesthetic + typography + editing) |
Next Post
The next post is "AI Video Generation 2026 — Sora 2, Veo 3, Runway Gen-4, Kling 2, Pika 2, Open-Sora: Where Are We Really?". Same shape — category explosion (the 2024 Sora demos) and maturation (2026's commercial tools), the hardest slice (long consistency, character identity, fingers and physics), open-source options (Open-Sora, Mochi, HunyuanVideo, Wan), real workflows (ads, short-form, concept visuals), and the licensing fight (NYT-OpenAI, Disney licensing). With that post the image / music / video triangle closes.
References
- Black Forest Labs — Official Site
- Flux.1 Announcement — Announcing Black Forest Labs
- Flux.1.1 Pro Ultra and Raw Mode
- Flux.1 Kontext — Image Editing as a Foundation
- Flux Tools — Fill, Canny, Depth, Redux
- Flux.1 Schnell — Hugging Face
- Flux.1 Dev — Hugging Face
- Midjourney — Official
- Midjourney v7 — Updates
- Google DeepMind — Imagen
- Imagen 4 GA — Google Cloud Vertex AI
- OpenAI — gpt-image-1 / 4o Image Generation
- OpenAI Images API Docs
- DALL-E 3 — OpenAI
- Ideogram — Official
- Ideogram 3.0 Launch
- Recraft — Official
- Recraft V3 Tops Hugging Face TTI — TechCrunch
- Adobe Firefly — Official
- Firefly Image 4 — Adobe Blog
- Adobe Firefly Training Data FAQ
- Stable Diffusion 3.5 Announcement — Stability AI
- SD 3.5 Large — Hugging Face
- SDXL — Stability AI
- HiDream-I1 — GitHub
- HiDream-I1 — Hugging Face
- Janus-Pro — DeepSeek
- Krea AI — Official
- ComfyUI — GitHub
- Forge UI — GitHub
- Automatic1111 — GitHub
- InvokeAI — Official
- SwarmUI — GitHub
- Fooocus — GitHub
- Civitai — LoRA Catalog
- fal.ai — Flux Pro API
- Replicate — Flux Models
- Getty Images v. Stability AI UK Ruling — Reuters
- Andersen v. Stability AI — Justia
- Reuters — AI Training Copyright Tracker
- Vertex AI Imagen 4 Pricing
- Magnific — Official
- Leonardo.AI — Official
- OpenArt — ComfyUI Workflow Hosting