Skip to content

✍️ 필사 모드: AI Image Generation 2026 — Flux, Imagen 4, Midjourney v7, Ideogram, Recraft, Firefly, DALL-E, Stable Diffusion: An Honest Comparison

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Prologue — Two Earthquakes

August 2022. Stable Diffusion 1.4 was released. Before that, image generation AI lived inside the closed betas of OpenAI's DALL-E 2 and Midjourney v3. The moment SD shipped with open weights, the whole category was reshaped. Anyone with a 4090 could generate unlimited images locally. ComfyUI, Automatic1111, Fooocus, and Forge exploded. LoRA, ControlNet, and IP-Adapter extensions arrived in waves. 2023 was "the year of the SD ecosystem."

Early 2024. A name appeared called Black Forest Labs. It was the original SD core team — Robin Rombach, Andreas Blattmann, Patrick Esser, Dominik Lorenz — who had left Stability AI to start a new company. In August they announced Flux.1, a three-tier system: open-weight Schnell (Apache 2.0), non-commercial Dev, and API-only Pro. On day one Flux beat SD-XL, and within a year it became the new default for open-weight image models. That was the first earthquake.

Spring 2025. The second earthquake hit. Midjourney v7 launched in April and reset the bar for consumer aesthetic again. In June Google's Imagen 4 went GA. Adobe Firefly Image 4 followed the next quarter. In August, Black Forest Labs released Flux.1 Kontext, opening a separate track called "image editing." In November, OpenAI replaced DALL-E 3 with gpt-image-1 as the default model inside ChatGPT. Ideogram pushed its typography lead further with v3.

As of May 2026, the landscape looks like this.

  • The open-weight throne. Flux replaced SD-XL/3.5. The most-loaded base model in ComfyUI and Forge is now a Flux derivative. Stability AI has lagged a beat since SD 3.5 Large.
  • The consumer aesthetic peak. Midjourney v7, followed by Imagen 4 Ultra. If a designer must pick one image to ship, the answer is still one of these two.
  • The solo typography champion. Ideogram v3. If a poster needs legible text, there is almost no other choice.
  • The designer tools. Recraft built a category as "the AI that also exports vectors." Firefly hardened the position of "safe-to-use image inside Adobe workflows."
  • The developer backends. OpenAI gpt-image-1, Google Imagen 4, Flux Pro 1.1 — the three most-called image APIs in production.
  • Lawsuits and licensing. The UK Stability AI vs Getty ruling in November 2025 separated the question: training is lawful, output-level trademark similarity is a different problem. That ruling firmed up the marketing position of Firefly, Imagen, and gpt-image-1 as "license-clean."

This post tries to map that landscape honestly. Which model wins which slice, what local vs cloud actually means in 2026, whether ComfyUI is really dead (it is not), and where the lawsuits land. The same shape as the music post — five axes, one decision tree, an anti-pattern table at the end.

One-line take: in 2026 image generation there is no "single best model." The five axes — typography, consistency, editing, licensing, aesthetic — have split into different tools. Knowing the tool turns an hour-long job into a ten-minute one.


1 · The Arrival of the Flux Era — A New Baseline for Open Weights

1.1 Who Is Black Forest Labs

March 2024. The core researchers at Stability AI — Robin Rombach, Andreas Blattmann, Patrick Esser, Dominik Lorenz — left to start Black Forest Labs. Headquartered in Freiburg, Germany. Seed round in August 2024, around 31 million USD, led by Andreessen Horowitz with General Catalyst, Y Combinator, and MätchVC joining.

These are the original authors of SD 1.x, 2.x, and SD-XL. The people who built the open-weight image generation category started a new company, and their first product was Flux.1.

1.2 The Three Tiers of Flux.1

Flux.1 shipped as three variants of the same architecture trained on the same data.

  • Flux.1 Schnell. Apache 2.0 license. 1-4 step inference. The lightest, the most permissive. Commercial use allowed. Runs on a 4090 with 6-8 GB VRAM.
  • Flux.1 Dev. Black Forest Labs Non-Commercial License. Weights are public but commercial use is forbidden. For research, learning, and personal projects. 50-step guided inference.
  • Flux.1 Pro. Closed. API-only. Highest quality. Hosted on fal.ai, Replicate, Together AI.

The structure is clever. Open the weights to build the ecosystem; recover commercial value through Pro and licensing. In 2025, Flux.1.1 Pro and Flux.1.1 Pro Ultra (up to 4 megapixels) extended the Pro lane.

1.3 Why It Beat SD-XL

Three technical points explain the jump.

  1. 12 billion parameters. Roughly 4.6x SD-XL (2.6B) and 50 percent larger than SD 3.5 Large (8B).
  2. Rectified Flow. A variant of diffusion. Standard diffusion learns a curved path from noise to image; Rectified Flow tries to learn a straight one. Fewer sampling steps produce higher quality.
  3. MMDiT architecture. The multimodal diffusion transformer introduced in Stable Diffusion 3. Text and image flow through the same transformer blocks together. Prompt adherence jumped sharply over SD-XL.

In human-evaluation benchmarks at launch (August 2024), Flux Pro beat Midjourney v6, DALL-E 3, and SD 3 across the board. Hands, in-image text, and anatomy — the classic weak spots — improved the most.

1.4 Flux Kontext — The Editing Answer

May 2025. Black Forest Labs released Flux.1 Kontext. It is an "image + text -> edited image" model. Different from text-to-image.

What makes Kontext different.

  • Multi-input. One or more reference images plus a text instruction. Things like "preserve this face, change the outfit to a black suit" or "blend these two inputs into a unified tone" work natively.
  • Local edits. "Change this part, keep the rest." No inpainting mask required. The instruction is in text.
  • Multi-turn. Stack edits on the same image — outfit, then background, then lighting, then hair color.
  • Three variants. Kontext Pro (API), Kontext Max (highest quality), Kontext Dev (open weights, non-commercial).

Before Kontext, image editing was a stack of ControlNet, IP-Adapter, inpainting masks, and LoRA. Now a single text instruction does most of it.

1.5 Flux Tools — Auxiliary Models

November 2024. Black Forest Labs released Flux Tools.

  • Flux Fill. Inpainting and outpainting. Mask plus text to fill a region.
  • Flux Canny. Replacement for Canny-edge ControlNet.
  • Flux Depth. Depth-map guidance.
  • Flux Redux. Image variation. Generate similar-mood variants of a single input.

These absorbed most of the ControlNet and IP-Adapter ecosystem from the SD 1.5 and XL era.

1.6 The Local Execution Reality

Running Flux Dev on a 4090.

  • Full precision (FP16). 24 GB VRAM. One image in around 20 seconds.
  • FP8 quantization. 12-16 GB. One image in around 15 seconds. Quality drop is negligible.
  • NF4 quantization. 6-8 GB. One image in around 25 seconds (slower). Slight quality dip, but a 4060 8 GB can run it.
  • Schnell. Four steps is enough. Under 5 seconds per image.

ComfyUI, Forge, SwarmUI, and InvokeAI all support Flux natively. By 2025 "local image generation = Flux" became the default.


2 · The Consumer Aesthetic Peak — Midjourney and Imagen

2.1 Midjourney v7

Midjourney is the aesthetic reference for the category. Other models chase the look Midjourney sets. v7 went to alpha in April 2025 and GA in June.

Key changes in v7.

  • Personalization. On first use, the model asks you to rate around 200 images. It learns your taste, and the same prompt produces different output for different users.
  • Draft Mode. A fast, cheap draft. Roughly one-tenth the token cost, four images in under thirty seconds. Pick the favorite and upscale to full mode.
  • Style Reference v2. A --sref code or a reference image holds a consistent style. Much more stable than v6.
  • Character Reference (--cref). Keeps the same character across many panels. The core of comic and storybook workflows.

In a single line — Midjourney wins on "the aesthetic finish of a single scene." For a poster, an illustration, a single moodboard frame, the output is the least-corrected by a working designer.

Weaknesses.

  • Text rendering. v7 is still weak at letters. Below Ideogram.
  • Photo-grade realism. Concedes to Imagen 4 Ultra for photographic work.
  • No API. Midjourney has no official API. Discord bot plus unofficial wrappers. Unsuitable for production automation.
  • Commercial license. Allowed at Pro and above. But the training-data licensing is not advertised as clean.

2.2 Google Imagen 4

Imagen 4 went GA in June 2025. Two big steps up from Imagen 3 (December 2024).

  • Imagen 4 Standard. Fast, general.
  • Imagen 4 Ultra. The peak of photo-grade realism. Portraits, landscapes, product shots — head-to-head with Midjourney v7 Photo.
  • Imagen 4 Fast. Cost-optimized variant.

What stands out.

  • Text rendering improved sharply. Imagen 3's weak spot is now usable in v4. Not as accurate as Ideogram, but better than Midjourney.
  • Mandatory SynthID watermarking. Every output carries an invisible watermark. Lines up with the broader standardization push for AI provenance.
  • Vertex AI integration. The easiest path for enterprise adoption. Inherits SOC 2 and HIPAA compliance.
  • Commercial safety. Google offers explicit IP indemnification on outputs, same lane as Firefly.

Weaknesses.

  • Creative aesthetic. Strong at photo, average at "the personality of an illustration." Midjourney and Flux still win there.
  • Content filters. Enterprise safety thresholds are strict; some legitimate prompts get rejected.

2.3 OpenAI gpt-image-1

In April 2025 OpenAI introduced a new default image model called gpt-image-1 inside ChatGPT. The earlier default, DALL-E 3, moved to backup.

Where gpt-image-1 sits.

  • Conversational editing. "Make this -> change the color -> add this caption" feels natural across turns. Same direction as Flux Kontext but with a chat interface.
  • Text rendering. A big step up from DALL-E 3, roughly at Imagen 4's level. Still under Ideogram.
  • Realism. A step behind Imagen 4 Ultra. Average aesthetic compared to Midjourney v7.
  • API pricing. Output-token based. Roughly 0.02 to 0.19 USD per image depending on quality.

Because it is the model most invoked inside ChatGPT, by raw call volume it may be the category leader in 2026. The accurate framing is "not the highest quality, but the lowest-friction interface."

2.4 Comparison — Consumer and API Models

ToolAestheticRealismText RenderingEditingLicenseAPI
Midjourney v7TopVery highWeak--cref consistencyPro and aboveNone (unofficial only)
Imagen 4 UltraHighTopDecentSeparate (Imagen Edit)IndemnifiedVertex AI
gpt-image-1DecentHighDecentStrong (chat)Standard OpenAIOpenAI API
Flux Pro 1.1HighVery highDecentKontext (separate)Commercial (Pro)fal/Replicate
DALL-E 3DecentHighDecentWeakStandard OpenAIOpenAI API (legacy)

3 · Typography and Designer Tools — Ideogram, Recraft, Firefly

3.1 Ideogram v3 — When Text Has to Appear

Ideogram is a Toronto startup founded in August 2023. CEO is Mohammad Norouzi, formerly at Google Brain and one of the original Imagen authors. From day one the company picked "text inside images" as its core differentiator.

  • Ideogram 1.0 (Feb 2024). Most accurate text rendering in the category.
  • Ideogram 2.0 (Aug 2024). Strengthened realism and style control.
  • Ideogram 3.0 (Mar 2025). Pushed typography and aesthetic up together. Effectively solo leader for posters, business cards, ad copy, and book covers.

Core features.

  • Magic Prompt. Auto-expands prompts to improve results.
  • Canvas. Inpainting, outpainting, and mask editing in a single workflow.
  • Style Reference. Reference image for consistent style.
  • Character Consistency. Added in v3. Hold a character across multiple frames.

Typography accuracy is not "the letters are readable" but "the designer can ship the output as-is." That gap is decisive against Midjourney, Flux, and Imagen.

Weaknesses: realism slightly behind Imagen 4 Ultra, and character consistency slightly behind Midjourney's --cref.

3.2 Recraft — AI for Designers

Recraft picked a single position — "AI for graphic designers." Recraft V3 launched in October 2024 and briefly topped the Hugging Face TTI leaderboard.

Differentiators.

  • Vector output. Direct SVG generation. Logos, icons, illustrations as vectors. Nearly unique among AI tools.
  • Brand library. Save palette, fonts, and style; apply consistently across outputs.
  • Integrated image plus text. Posters treat type as a design element. More design-friendly typesetting than Ideogram.
  • Image editing. Inpainting, outpainting, object removal, background change in a unified UI.
  • 3D mockups. Auto-mapping images to 3D objects (mugs, books, phones).

After V3, V3 Plus shipped in 2025 with further realism gains. V3.5 is in beta as of May 2026. For a designer, Recraft is the single tool that handles "generate -> edit -> deliver in another format."

Pricing: 50 free credits per day, Basic 12 USD per month, Pro 33 USD per month.

3.3 Adobe Firefly — The Licensing Clarity Story

Adobe Firefly's value prop is one sentence: "trained only on Adobe Stock and public domain." Other models live in a licensing grey zone; Firefly does not.

Firefly Image 4 launched in May 2025 and Image 4 Ultra arrived that autumn.

  • Firefly Image 4. General-purpose. Balanced across realism, illustration, and text.
  • Firefly Image 4 Ultra. High resolution and detail. Ads, publishing, product design.

Strengths inside the Adobe ecosystem.

  • Photoshop Generative Fill. Firefly powers inpainting and outpainting inside Photoshop. Zero-friction adoption in the designer workflow.
  • Illustrator Generative Recolor. Auto color variants for vectors.
  • Premiere Pro Generative Extend. Extends video clips with text (a separate Firefly Video model).
  • Adobe Express. Integrated UI for non-experts.
  • Indemnification. Enterprise customers get IP indemnification on outputs.

Weaknesses.

  • Standalone aesthetic. Less of a "wow" than Midjourney v7.
  • Content filters. Strong safety thresholds reject many faces, public figures, and certain commercial concepts.
  • Price. Effectively free for existing Creative Cloud subscribers, expensive standalone.

3.4 Comparison — Typography and Designer Tools

ToolText AccuracyVectorDesigner WorkflowLicense ClarityPrice
Ideogram v3TopNoCanvas integratedDecentFree to 20 USD/mo
Recraft V3Very highYes (SVG)Brand libraryDecentFree to 33 USD/mo
Firefly Image 4HighNoAdobe integratedTopIncluded with CC

4 · Open Source and Local — Stable Diffusion 3.5, SD-XL, HiDream, Janus-Pro

4.1 Where Stable Diffusion Stands

Stability AI, who created the category in 2022, had a rough 2024-2025.

  • SD 3 Medium (June 2024). Hit immediate backlash over anatomy issues. License changes (Creator vs Enterprise split) were also controversial.
  • SD 3.5 Large/Medium/Large Turbo (October 2024). Addressed SD 3's weak spots. 8B/2.5B/8B parameters. Stability AI Community License (free under 1 million USD annual revenue).
  • SD-XL 1.0 (July 2023). Still the most-used base model by sheer volume. The LoRA ecosystem grew up around SD-XL.

In May 2026, SD 3.5 is "still used but not the category leader." Flux is clearly ahead. Stability AI, after a 2024 CEO turnover and financial struggles, has shifted weight toward Stable Audio, Stable Video, and 3D.

4.2 SD-XL — The Power of the Legacy

SD-XL persists for one reason. The LoRA, ControlNet, IP-Adapter, and Textual Inversion ecosystem is enormous. Tens of thousands of SD-XL LoRAs live on Civitai. For specific art styles, characters, and aesthetics, SD-XL still offers the deepest catalog.

When to stay on SD-XL.

  • A required LoRA exists only for SD-XL. Anime styles, specific illustrator looks, recurring characters.
  • Precise control via ControlNet. Pose, depth, edges.
  • Hardware is limited. SD-XL runs comfortably on 8 GB VRAM.
  • An existing ComfyUI graph is in production. Don't break what works.

When to move to Flux.

  • Starting a new base workflow.
  • Prompt adherence matters. Flux is far ahead.
  • Commercial licensing must be unambiguous. Flux Schnell.

4.3 HiDream — The 2025 Newcomer

HiDream-I1, released April 2025, is a 17B open-weight model under the MIT license. In some academic benchmarks it edges out Flux Dev.

  • Hardware. 24 GB VRAM recommended. NF4 quantization drops to 12 GB.
  • Quality. Balanced across realism, text, and consistency. Roughly level with Flux Dev.
  • License. MIT — fully commercial. The decisive advantage over Flux Dev (non-commercial).

ComfyUI supports it natively. As of May 2026, HiDream is the "real free alternative to Flux Dev."

4.4 Janus-Pro / Krea — Adjacent Directions

Janus-Pro (DeepSeek, January 2025). A multimodal LLM that also generates images. Text and images flow through the same model. 7B parameters, MIT license. Quality is below Flux, but the paradigm of "LLM as image generator" is worth tracking.

Krea AI. Not a model vendor but a workflow platform. Aggregates multiple models behind one interface. Its real-time canvas (the AI fills in as you sketch) is the differentiator. Krea launched its own model Krea-1 in 2025.

4.5 Comparison — Open Source and Local

ModelParamsLicenseMin VRAMStrength
Flux.1 Schnell12BApache 2.06 GB (NF4)Fast, free, commercial OK
Flux.1 Dev12BBFL Non-Commercial6-24 GBTop open-weight (non-commercial)
HiDream-I117BMIT12-24 GBCommercial Flux alternative
SD 3.5 Large8BStability Community8-16 GBMature catalog
SD-XL 1.02.6BOpenRAIL++6-8 GBLoRA ecosystem
HiDream Dev17BMIT12 GBDistilled HiDream

5 · Tools vs Models — Where ComfyUI, Forge, and A1111 Are

5.1 Are the UIs Dead? No.

A common myth in 2025 was that Automatic1111, ComfyUI, Forge, Fooocus, and InvokeAI faded as the category moved to hosted models. Wrong. The category shape just changed.

ComfyUI got bigger in 2025-2026. Node-based, so new models get new nodes quickly. Flux, HiDream, and every video model (Wan, HunyuanVideo, LTX-Video) lands in ComfyUI first. It is now the de-facto standard for AI image and video workflow automation.

Forge UI (Forge / Forge Classic). A fork of Automatic1111. Memory optimization runs Flux on 8 GB GPUs. The UI is friendlier than ComfyUI, so it became the entry point for beginners.

Automatic1111 (A1111). Update cadence slowed in 2025. Flux support arrived later than Forge, and ControlNet lagged. The legacy SD-XL user base stays, but new entrants moved to Forge/ComfyUI.

InvokeAI. Pivoted to commercial SaaS. Targets enterprise workflow solutions.

SwarmUI. Uses ComfyUI as a backend but presents a friendlier UI. Recommended for users who don't want to draw node graphs.

Fooocus. Midjourney-style simple UI. Best onboarding for non-technical users.

5.2 A Sample ComfyUI Graph

Flux Dev plus Flux Kontext plus LoRA plus upscale, in a single graph.

[LoadCheckpoint Flux Dev]
        |
        +-[Text Encoder] <- [Prompt: "cyberpunk alley, neon"]
        |       |
        |  [KSampler] <- [Empty Latent 1024x1024]
        |       |
        |  [VAE Decode]
        |       |
        +-[LoadKontext] <- [Reference image]
        |       |
        |  [Kontext Edit] <- [Instruction: "make it sunset"]
        |       |
        +-[LoadLora char-v1] <- [Strength 0.8]
        |       |
        +-[Upscale 4x ESRGAN]
        |       |
        +-[Save Image]

Build the graph once and you can rerun it with different prompts to generate hundreds of variants. The automation matches direct API use, with the bonus of seeing every intermediate step visually.

5.3 New Entrant UIs

  • Krea. Real-time canvas. The AI fills in as you sketch. Popular with designers.
  • Magnific. Upscaling and detail enhancement specialist. Post-processes other models' outputs.
  • Leonardo.AI. SaaS UI plus in-house model plus integrated workflow.
  • OpenArt. Hosts ComfyUI workflows on the web. Share node graphs without managing your own server.

5.4 Where to Put Your Workflow

One-line picks.

  • One quick shot: Midjourney v7, Ideogram, Imagen 4 (web UI).
  • Automation and batch: API (fal.ai, Replicate, OpenAI, Vertex AI) or local ComfyUI.
  • Fine control (LoRA, ControlNet): Local ComfyUI or Forge.
  • Designer workflow: Recraft, Firefly, Krea.
  • Engineering integration: API.

6 · Lawsuits and Licensing — Honestly

6.1 Stability AI vs Getty Images

The most-cited case. Getty Images sued Stability AI in both the UK and US (2023).

UK ruling, November 2025, High Court of Justice.

  • Training itself is not UK copyright infringement. The court found that original images are not preserved inside model weights.
  • Trademark infringement is separate. Where outputs partially reproduced the Getty watermark, the court did find trademark infringement.
  • Summary. Training is lawful, output-level trademark similarity is not.

The US case is still pending in May 2026. US law differs and the outcome may differ.

6.2 Other Active Cases

  • Andersen v. Stability AI. A class action by an artist group against Stability, Midjourney, and DeviantArt. Some claims dismissed; copyright claims remain alive.
  • NYT v. OpenAI. A text-training case, but its precedent will affect image-training case law.
  • Disney licensing. Reports in 2025 indicated Disney was negotiating direct licensing deals with several AI companies. Direct major-IP licensing may become standard.

6.3 What Users Should Do

Safer commercial options (May 2026).

  1. Adobe Firefly. Adobe Stock plus public domain only. Indemnification. The safest.
  2. Google Imagen 4. Indemnification. License-clean training data marketed explicitly.
  3. OpenAI gpt-image-1. Standard OpenAI terms. Indemnification only on the Enterprise plan.
  4. Flux Schnell, self-hosted. Apache 2.0 weights. Outputs belong to the user.
  5. HiDream-I1. MIT weights. Commercial use OK.

Grey zone.

  • Midjourney. Commercial use of outputs allowed at Pro and above, but explicit training-data licensing is not advertised.
  • SD-XL plus community LoRA. Many LoRAs have unclear training-data provenance, especially "specific artist style" LoRAs.
  • Recraft. License policy is stated, but training-data sources are only partially disclosed.

Risky behavior.

  • Famous artist or illustrator names in the prompt. "In the style of [Artist]" output, used commercially, is clearly risky.
  • Direct imitation of trademarked characters and IP. Disney characters, game characters, brand logos.
  • Selling NFTs or merchandise without explicit license confirmation.

6.4 However the Lawsuits End

Three scenarios.

Scenario A — "training is transformative fair use" prevails. AI training is legalized. Output-level trademark and similarity issues remain separate. The "explicit licensing" marketing edge of Firefly and Imagen narrows.

Scenario B — "training requires licensing" prevails. Stable Diffusion and Midjourney face licensing settlements or forced retraining. Costs jump, subscriptions rise, and Firefly and Imagen pull further ahead.

Scenario C — Settlement and licensing standardize. Like the Disney-AI rumored deals: major-IP licensing becomes the norm, academic and open-source models live in a separate track. The most likely outcome.


7 · The Decision Framework — How to Pick

7.1 Recommendations by Use Case

SituationFirst ChoiceSecond ChoiceNote
Single concept illustrationMidjourney v7Flux Pro 1.1Aesthetic first
Photo-grade portrait or productImagen 4 UltraFlux ProRealism
Poster or ad with textIdeogram v3Recraft V3Typography accuracy
Logo or icon (vector)Recraft V3Adobe IllustratorVector output
Brand consistencyFirefly Image 4Midjourney --srefIndemnification + workflow
Character consistency (comics)Midjourney --crefFlux KontextMulti-panel
Image editingFlux Kontextgpt-image-1Text-driven
Inpainting / outpaintingPhotoshop + FireflyFlux FillWorkflow
API automationfal.ai + Flux ProVertex AI Imagen 4SLA
Local / privateFlux Dev (non-commercial)HiDream-I1 (commercial)Self-host
Free startFlux Schnell + ForgeSD-XL + Civitai LoRA4 GB+ GPU
Commercial safety firstFireflyImagen 4Indemnification
Academic / researchSD 3.5 + paper reproFlux DevVerifiability

7.2 Decision Tree

Start
 |
 +- Must the image contain text?
 |    +- Yes -> Ideogram v3 or Recraft V3
 |    +- No -> next
 |
 +- Photo-grade realism required?
 |    +- Yes -> Imagen 4 Ultra or Flux Pro 1.1
 |    +- No -> next
 |
 +- Designer workflow (brand, vector)?
 |    +- Yes -> Recraft or Adobe Firefly
 |    +- No -> next
 |
 +- Character or scene consistency required?
 |    +- Yes -> Midjourney `--cref` or Flux Kontext
 |    +- No -> next
 |
 +- License cleanliness top priority?
 |    +- Yes -> Firefly or Imagen 4 (indemnified)
 |    +- No -> next
 |
 +- Local / private execution required?
 |    +- Yes -> Flux Dev or Schnell, or HiDream-I1
 |    +- No -> next
 |
 +- API automation / batch needed?
      +- Yes -> fal.ai Flux Pro or OpenAI gpt-image-1
      +- No  -> Midjourney v7 (single-scene aesthetic)

7.3 Budget Guide

BudgetRecommendation
0 USD/monthFlux Schnell locally with Forge. 4 GB+ GPU. Unlimited.
10 USD/monthMidjourney Basic or Ideogram Basic. One tool.
30 USD/monthMidjourney Standard + Ideogram + ChatGPT Plus. Aesthetic + typography + editing.
60 USD/month+ Recraft Pro or Adobe CC. Full designer stack.
200+ USD/monthAPI usage (fal.ai Flux Pro + Imagen 4 + gpt-image-1) on top. Production automation.

Epilogue — Checklist, Anti-Patterns, Next Post

The shock of SD 1.4 in 2022, Flux 1's overtake in 2024, the Midjourney v7 / Imagen 4 consumer jump in 2025, and the Flux Kontext / gpt-image-1 editing paradigm shift in 2026 — the category has never sat still. Music and video shifted the same way. The difference is that images stabilized first. Users now ask "which tool for which job" rather than "which model is best." There is no one-line answer, but the five axes are clear — aesthetic (Midjourney), realism (Imagen), typography (Ideogram), designer workflow (Recraft / Firefly), open weights (Flux / HiDream).

Tool Selection Checklist

  1. Does the image contain text? If yes, lead with Ideogram or Recraft.
  2. Commercial use? If yes, Firefly / Imagen indemnification or self-hosted Flux Schnell.
  3. Single shot or a series? If a series, character consistency (--cref, Flux Kontext) is mandatory.
  4. Editing required? Pick one of Flux Kontext, gpt-image-1, or Photoshop Generative Fill.
  5. Local feasible? 16 GB+ GPU runs Flux Dev. 24 GB runs HiDream.
  6. Automation required? Use APIs. Midjourney is unsuitable for automation.
  7. Vector required? Recraft is nearly alone here.
  8. Realism or illustration? Realism -> Imagen 4 Ultra. Illustration -> Midjourney v7.
  9. Multi-turn editing? gpt-image-1 (chat) or Flux Kontext.
  10. License safety top priority? Firefly first, Imagen second.

Anti-Patterns

Anti-PatternWhy It HurtsInstead
Shipping the first generationAverage quality is lowGenerate 4-8, curate
Famous artist names in promptsLicensing grey zone, lawsuit riskAbstract descriptions ("late-80s synthwave poster")
Automating MidjourneyNo official API; unofficial wrappers violate ToSfal.ai Flux Pro, gpt-image-1, Imagen 4
Staying on SD-XL, ignoring FluxPrompt-adherence gap compoundsStart with Flux Schnell; keep SD-XL only when a LoRA is required
Avoiding ComfyUI as "too complex"Automation gap compoundsStart with Fooocus / Forge, graduate to ComfyUI
Shipping Flux Dev commerciallyViolates the Non-Commercial licenseUse Flux Schnell, Flux Pro, or HiDream
Posters with text via MidjourneyLetters breakIdeogram v3 or Recraft
Selling NFTs or merch without license labelsIP riskConfirm explicit commercial rights on outputs
Expecting 4K+ from a single generationModel outputs are usually 1-2 MPUpscale with Magnific / Topaz
Free-tier for client workLicense violations, watermarksAt minimum Pro
Single-model dependenceAesthetic / typography / editing gaps accumulateCombine 2-3 models (aesthetic + typography + editing)

Next Post

The next post is "AI Video Generation 2026 — Sora 2, Veo 3, Runway Gen-4, Kling 2, Pika 2, Open-Sora: Where Are We Really?". Same shape — category explosion (the 2024 Sora demos) and maturation (2026's commercial tools), the hardest slice (long consistency, character identity, fingers and physics), open-source options (Open-Sora, Mochi, HunyuanVideo, Wan), real workflows (ads, short-form, concept visuals), and the licensing fight (NYT-OpenAI, Disney licensing). With that post the image / music / video triangle closes.


References

현재 단락 (1/332)

August 2022. Stable Diffusion 1.4 was released. Before that, image generation AI lived inside the cl...

작성 글자: 0원문 글자: 27,733작성 단락: 0/332