필사 모드: AI Image Generation 2026 Deep Dive - Midjourney v7 · DALL·E 4 · Imagen 3 · FLUX · Stable Diffusion 3.5 · Ideogram · Recraft
EnglishPrologue — How the Simplicity of 2024 Ended
In spring 2024, when someone said "I want to make a picture with AI," we reached for one of three models. Midjourney v6, DALL·E 3, Stable Diffusion XL. Aesthetic taste went to Midjourney, chat integration went to DALL·E, your own GPU went to SDXL. Three choices, clear answers.
By spring 2026 that simplicity is gone. To answer the same question we now have to draw a decision tree first.
- Photoreal, illustration, poster with text, or vector logo?
- Closed API enough, or do you need open weights and your own GPU?
- Does training-data licensing matter, or just results?
- 5-second wait or 50-millisecond realtime?
- Custom LoRA needed, or out-of-the-box?
This article walks every branch of that tree across the 2026 AI image-generation landscape. The aesthetic standard set by Midjourney v7, OpenAI's gpt-image-1 and DALL·E 4 multimodal integration, Google Imagen 3 and 4 in the enterprise tier, the new baseline drawn by Black Forest Labs' FLUX series in open weights, Stable Diffusion 3.5 returning after Stability AI's restructuring, Ideogram's dominance in text-in-image, the design and vector category opened by Recraft V3, Adobe Firefly 3 chasing on the strength of license-safe training, plus newcomers like Reve Image 1.0, Krea AI, Leonardo, Playground v3. Tooling-wise we cover ComfyUI node graphs, ControlNet and IP-Adapter workflows, inpainting and outpainting, upscalers like Aura SR and 4x-UltraSharp, C2PA provenance watermarks, and finally the legal front — Andersen v Stability AI, Getty v Stability AI, NYT v OpenAI.
Chapter 1 · The 2026 Image-Gen Map — Three Camps, Five Categories
Drawn as one map, the 2026 market shows three camps.
**Camp A — Closed APIs**: Midjourney v7, OpenAI gpt-image-1 / DALL·E 4, Google Imagen 3 / 4, Ideogram 3.0, Recraft V3, Adobe Firefly 3, Reve Image 1.0. Weights are not published; inference runs on the vendor's infrastructure; users pay by token or subscription. Quality ceiling is high, safety filters are strong, tool integrations are smooth.
**Camp B — Open weights**: Black Forest Labs FLUX.1 [schnell] / [dev], Stable Diffusion 3.5 Large / Medium / Turbo, parts of Playground v3, parts of NovelAI. Weights are on HuggingFace and you can download and run on your own GPU. LoRA fine-tuning, ControlNet, IP-Adapter, and ComfyUI node graphs are the weapons of this camp. Civitai serves as the community LoRA hub.
**Camp C — Realtime**: Krea AI, Leonardo.AI's Realtime mode, fal.ai's LCM/Turbo hosting, plus the canvas UIs stacked on top. Target latency is 50 ms, not 5 s. Move a slider and the image follows; draw with the mouse and diffusion lays itself on top instantly.
Five categories cut across the three camps.
1. **Photoreal** — FLUX.1.1 [pro] Ultra, Imagen 3, Reve, Midjourney v7 raw mode
2. **Aesthetic / illustration** — Midjourney v7, Leonardo, NovelAI
3. **Text in image** — Ideogram 3.0, Recraft V3, DALL·E 4
4. **Design / vector** — Recraft V3, Adobe Firefly 3
5. **Editing / compositing** — DALL·E 4 inpainting, FLUX.1 Tools (Fill/Canny/Depth/Redux), Photoshop Generative Fill
The borders between camps blur. Black Forest Labs ships open-weight Dev while also running pro and ultra through APIs. Krea distills FLUX and SD3.5 with LCM and serves them in realtime rather than training its own. But the first questions a user asks — "Can I touch the weights?" "Am I paying tokens or GPU hours?" "Am I waiting 5 seconds or 50 milliseconds?" — still partition the field into the same three camps.
Chapter 2 · Midjourney v7 — The Aesthetic Standard
Midjourney v7 launched in 2025 and by spring 2026 sets the baseline for aesthetic taste, composition, and lighting. v6.1 served as the bridge to v7 in the interim. The Discord bot UX still works, but the home base is now the Midjourney web app (`alpha.midjourney.com`). Gallery, archive, and Rooms collab modes all live there.
Core feature bundle.
- **Image-to-image** — pass one or many images to absorb mood and composition. Weight is tuned with the `--iw` flag (image weight).
- **Style Reference (`--sref`)** — absorb only style and ignore content. Decisive when producing a consistent series in the same visual language. Variants like `sref random` allow improvisation.
- **Character Reference (`--cref`)** — keep a character or person consistent. Critical for picture books, comics, and illustration series where the same person must show the same face every time.
- **Mood Boards** — register multiple images as one bundle and absorb the aesthetic of the whole bundle.
- **Patchwork** — collaborative canvas mode. Multiple people split regions and generate or edit on the same board in parallel.
- **`--personalize`** — a personal model trained on the images you have liked. The same prompt yields different output per user.
- **`--raw`** — turn off aesthetic correction and get direct output. Closer to Imagen or FLUX when you need photoreal results.
- **Zoom / pan outpainting** — extend the canvas.
- **Inpainting (Vary Region)** — repaint a selected region.
Pricing is Basic `$10/month`, Standard `$30/month`, Pro `$60/month`, Mega `$120/month` across four tiers. All plans include a mix of fast and relax time. Pro and above unlock Stealth (private generations).
Midjourney's strengths narrow to two.
1. **Aesthetic consistency** — average quality is the most uniform across runs of the same prompt seed. The probability of "an okay one" is highest.
2. **Style library** — five years of accumulated user styles are baked into the model. A single `--sref` line calls them in.
Weaknesses are equally clear.
- Text rendering is weak. Letters inside posters, signs, and book covers are owned by Ideogram and Recraft.
- No API. Hooking it into automation goes through unofficial wrappers or Discord bots.
- Strict safety filters. Refusals are common for people, politicians, and brand logos.
Chapter 3 · DALL·E 4 / OpenAI gpt-image-1 — Multimodal Integration Arrives
OpenAI's image generation crossed a big threshold in March 2025. Instead of ChatGPT calling a separate model (DALL·E 3), GPT-4o gained the ability to **emit images natively**. Text and images flow through the same model and the same token stream. This is **gpt-image-1**. In spring 2025 the feature briefly took over Twitter as the Ghibli-style transfer moment.
OpenAI then raised image quality in stages, and by 2026 the lineup is a dedicated **DALL·E 4** brand alongside the gpt-image-1 multimodal line. Shared features across both.
- **Inpainting** — paint a mask and regenerate only that region.
- **Outpainting** — extend beyond the canvas.
- **Transparency** — PNG output with alpha. Decisive for design composition.
- **Reference images** — bind style or character with one or two inputs.
- **Text rendering** — letters inside posters and signs are clean. The 2024 weak spot is mostly resolved.
The API surface is three calls: `images.generate`, `images.edit` (inpainting), `images.variation`. Pricing is around `$0.04` for a standard 1024x1024, HD tiers cost more. The Responses API takes image input and output together, making multimodal agent workflows feel natural.
Inside ChatGPT you simply say "make me an image of X." The result drops into the chat, and natural-language follow-ups like "redo this part," "make it black and white," and "add this text here" just work. This is less a design tool and more an **interface for refining images through conversation**.
Strengths.
- Most natural natural-language follow-up editing.
- Trustworthy text rendering.
- Massive ChatGPT user base means dominant baseline accessibility.
Weaknesses.
- Aesthetic ceiling is lower than Midjourney or FLUX.
- Strict safety filters lead to frequent refusals (people, violence, brands).
- Flat token pricing makes bulk generation expensive.
Chapter 4 · Google Imagen 3 / 4 + Veo 2 / ImageFX / Whisk — The Search Company's Answer
Google's image generation runs along two tracks. One is **Imagen** — the model served to enterprise on Vertex AI. The other is the **consumer tooling** (ImageFX, Whisk).
**Imagen 3** shipped at the end of 2024 and **Imagen 4** followed in 2025. Highlights.
- **Realism** — skin, hair, reflections, and shadows are hard to tell apart from a photograph. Quality is aimed squarely at advertising and stock photo markets.
- **Prompt fidelity** — instructions like "a red umbrella on the left, a blue traffic light on the right" are followed reliably.
- **Multilingual prompts** — Korean, Japanese, and Chinese inputs do not destabilize the output.
- **Vertex AI integration** — your Cloud project's IAM, VPC, and logs apply. For enterprises with SOC 2 or HIPAA needs, this is the nearly unique option.
**ImageFX** is the free consumer tool at labs.google. Prompt editing is unusually smooth — click a word and synonym candidates appear as chips.
**Whisk** is an experiment released in December 2024. Its input is not text but **three images** (subject, scene, style). "This dog, in this living room, in this artistic style" composes in one shot.
**Veo 2** is video, not image, but it accepts an image made by Imagen as the opening frame and extends it to an 8-second clip. The image-to-video bridge inside one company is natural and one-step.
Pricing is around `$0.04` per Imagen 3 image on Vertex AI. ImageFX is free with limits. Whisk is in free beta.
Chapter 5 · FLUX (Black Forest Labs) — The Stable Diffusion Founders' New Company
In August 2024, core researchers who had built Stable Diffusion at Stability AI left and started a new company, **Black Forest Labs** (BFL). Headquartered near Stuttgart, Germany, with a roughly `$31M` seed led by Andreessen Horowitz. The debut model was FLUX.1 in three variants.
- **FLUX.1 [schnell]** — distilled variant that runs in about 4 steps. Apache 2.0 license. Commercial use unrestricted. Weights downloadable from HuggingFace.
- **FLUX.1 [dev]** — 50-step standard variant. Weights published under a non-commercial license. Personal and research use is free; commercial use needs a separate license.
- **FLUX.1 [pro]** — the largest variant. Weights closed. Available through the BFL API and partners like fal.ai, Replicate, Together.ai.
October 2024 brought **FLUX 1.1 [pro]**. Same interface, better quality, faster inference. About `$0.04` per image. In 2025 two more arrived.
- **FLUX 1.1 [pro] Ultra** — generates directly at 4K (4 megapixels). Diffusion runs in 4K latent space from the start rather than upscaling 1024 -> 4K. Decisive for commercial advertising and print.
- **FLUX 1.1 [pro] Ultra raw mode** — no aesthetic correction, closer to photograph.
And the decisive shipping move, **FLUX.1 Tools** (November 2024). Four companion models.
1. **FLUX.1 Fill [dev/pro]** — dedicated inpainting/outpainting. Vastly more consistent than SD1.5/SDXL inpainting models.
2. **FLUX.1 Canny [dev/pro]** — Canny-edge conditioning. The model itself takes it without an external ControlNet.
3. **FLUX.1 Depth [dev/pro]** — depth-map conditioning. Feed a 3D render's depth map straight in.
4. **FLUX.1 Redux [dev/pro]** — recontextualize style and composition from a reference image. IP-Adapter-style work done by the model itself.
Three technical points behind FLUX.
1. **Rectified Flow Transformer** — instead of stochastic denoising in the DDPM family, training learns the straight line between noise and data. Step count goes down and stability goes up.
2. **Multimodal DiT** — combines text and image inside the same transformer via cross-attention. MMDiT architecture similar to SD3.
3. **16-channel VAE** — latent channels increased from 4 to 16. Fine detail survives.
Licensing forms a clean three-tier ladder.
| Variant | Weights | Commercial use | Notes |
|---------|---------|----------------|-------|
| schnell | Open | Allowed | Apache 2.0 |
| dev | Open | Separate license needed | Non-commercial free |
| pro / ultra | Closed | Through API | BFL / fal / Replicate |
Dozens of FLUX workflows are catalogued as ComfyUI nodes, and Civitai hosts tens of thousands of LoRAs built on FLUX dev. **In 2026 the open-weight photoreal standard is effectively FLUX.**
Chapter 6 · Stable Diffusion 3.5 — After Stability AI's Restructure
Spring 2024 brought turmoil to Stability AI. Core researchers left for BFL, the CEO changed, and funding troubles became public. A new management team then restructured, and in October 2024 it shipped **Stable Diffusion 3.5**.
Three variants.
- **Stable Diffusion 3.5 Large** — 8.1B parameters. Standard 1024x1024.
- **Stable Diffusion 3.5 Medium** — 2.5B parameters. Same 1024x1024 but lighter.
- **Stable Diffusion 3.5 Large Turbo** — 4-step distilled. Near-realtime speed.
Licensing is the **Stability AI Community License**. Individuals and small businesses under `$1M` in annual revenue have unrestricted commercial use; above that requires a separate enterprise license. This is the follow-up to SD3 Medium (the model criticized at its June 2024 launch for anatomy issues with people), and the human anatomy and finger problems are largely fixed.
Architecture is **MMDiT** (Multimodal Diffusion Transformer). Transformer-based diffusion similar to FLUX. Two text encoders are used together — T5 and CLIP.
Strengths.
- Clear licensing all the way up to small business.
- Thick LoRA, ControlNet, and IPAdapter ecosystem (accumulated since the SD 1.5 era).
- Still the base model position alongside SDXL for Apache-2.0-friendly variants.
Weaknesses.
- Loses to FLUX on photoreal.
- Text rendering is far behind Ideogram and Recraft.
- Aesthetic taste at equivalent weight is behind Midjourney and NovelAI.
That said, in 2026 the choices for **"the open-weight base model that runs on my own GPU"** narrow to two — FLUX.1 [dev] and SD 3.5 Large. SDXL is still alive for LoRA compatibility, but new work bases on those two.
Chapter 7 · Ideogram 2.0 / 3.0 — Undisputed Leader in Text-in-Image
Ideogram started life with a single goal: **text rendering in AI images**. Through Ideogram 2.0 in August 2024 and 3.0 in spring 2025 it holds a dominant lead in that category.
Specialties.
- **Letters inside posters, signs, book covers, and logos** — near-perfect in English. Korean, Japanese, and Chinese trail English but outperform competitors.
- **Magic Prompt** — the model auto-expands a short prompt. Toggleable when it conflicts with intent.
- **Style Reference** — the equivalent of Midjourney's `--sref`.
- **Canvas** — integrated inpainting, outpainting, and magic fill.
Pricing is free with watermark, Basic `$8/month`, Plus `$20/month`, Pro `$60/month`. There is an API at `api.ideogram.ai`. Design agencies and ad agencies use this model for one reason — "precise letters inside a poster" is something no other model approaches.
The big additions in 3.0 are **multi-reference** and **direct high-resolution output**. Keeping tone, character, and typography consistent across a series campaign got dramatically easier.
Chapter 8 · Recraft V3 — A New Category Called Design / Vector
Recraft's V3 launch in autumn 2024 **opened a separate category**. Not "AI image" but **AI design** or **AI vector**. The output is an SVG vector, or a design a designer can take straight into InDesign, Illustrator, or Figma.
Core features.
- **Direct SVG vector output** — paths, not pixels. Infinite zoom without quality loss.
- **Text rendering** — together with Ideogram, the leading pair. Font, kerning, alignment are all instructable.
- **Brand Kit** — register a company's color palette, fonts, and logo, applied consistently to every output.
- **Style library** — 6000+ user-registered styles.
- **Infinite canvas** — free-form layout environment.
In autumn 2024 benchmarks like Artificial Analysis, Recraft V3 took first place in the text-in-image category. For designers and illustrators, that single line was decisive.
Pricing is free (50 credits/day), Basic `$12/month`, Advanced `$33/month`, Pro `$60/month`. The API is at `api.recraft.ai`. Marketing and branding teams now routinely register their brand assets and run Recraft as in-house design support.
Chapter 9 · Adobe Firefly 3 — The Value of License-Safe Training
Adobe Firefly differs from every other model on one line. **Its training data is only license-clear Adobe Stock imagery plus public domain.** No internet crawl. Andersen and Getty-class lawsuit risk is near zero. This is the strongest reason enterprises pick Firefly.
Firefly 3 features (released in 2024).
- **Style Reference** — absorb tone and composition from a reference image.
- **Structure Reference** — keep the shape of an input image and change only its content.
- **Photoshop integration** — Generative Fill, Generative Expand, and Generative Remove all use the Firefly backend.
- **Illustrator integration** — vector generation and expansion.
- **Premiere integration** — Firefly Video for video generation.
- **Legal indemnification** — Adobe covers legal costs if a Firefly output is challenged on copyright grounds.
Enterprise pricing is negotiated separately. Consumers receive Generative Credits as part of a Creative Cloud subscription.
The aesthetic ceiling is below Midjourney and FLUX, but for companies that need **outputs the legal team will sign off on**, this is essentially the only option.
Chapter 10 · Reve Image 1.0, Krea AI, Leonardo, Playground v3 — The Followers
Followers who have found their seat alongside the headliners.
**Reve Image 1.0** (March 2025) — debut from a young startup. At launch it briefly took first place on the Artificial Analysis text-to-image leaderboard. Strong on photoreal quality and prompt fidelity. API-first, competitive pricing. About `$0.03` per image.
**Krea AI** — flagship of the realtime category. Distills FLUX and SD3.5 with LCM and serves at 50 ms latency. Draw on the canvas with a mouse and diffusion follows instantly. The Realtime / Enhance / Train (your own LoRA) menu is the workflow axis.
**Leonardo.AI** — aimed at games and illustration. Mixes its own models (Phoenix and others) with SDXL fine-tunes. Strong on character consistency and multi-composition. A generous free tier brings in many newcomers.
**Playground v3** — Playground.ai's own model. The v3 release in autumn 2024 raised photo and design quality sharply. Some weights are published for research.
Chapter 11 · ComfyUI / Forge / AUTOMATIC1111 / InvokeAI / Fooocus — Open-Source UIs
Running open-weight models on your own GPU needs a UI. The 2026 landscape.
**ComfyUI** — the standard for node-based workflows. Connect fine-grained nodes (Load Checkpoint, KSampler, VAE Decode, ...) with wires to build a pipeline. The learning curve is steep, but once internalized it gives the most freedom to combine ControlNet, IPAdapter, and LoRA. FLUX, SD3.5, and SDXL are all supported from day one in ComfyUI.
**Forge** — a fork of A1111. The UI matches A1111 but the backend is modernized. Inference speed for SDXL and FLUX is 1.5x to 2x faster than A1111. Repository at `lllyasviel/stable-diffusion-webui-forge`.
**AUTOMATIC1111 (A1111)** — the oldest SD UI. The de facto standard from late 2022, but updates have slowed since 2025. Many SD 1.5 and SDXL workflows still run here.
**InvokeAI** — UI aimed at commercial and enterprise use. Strong on infinite canvas, layer editing, and team collaboration.
**Fooocus** — "beginner mode for ComfyUI." A simple node-free UI on top of the ComfyUI backend. Recommended for newcomers.
Selection matrix.
- **Maximum flexibility** -> ComfyUI
- **A1111 muscle memory** -> Forge
- **Team / enterprise** -> InvokeAI
- **Beginner** -> Fooocus
- **Lots of legacy SD 1.5 LoRAs** -> A1111
Chapter 12 · ControlNet — Lvmin Zhang's Decisive Move
In February 2023, Stanford's Lvmin Zhang published the **ControlNet** paper that rewired diffusion workflows in one stroke. One-line summary: **"A side network that lets a diffusion model take additional conditions (edges, depth, pose) as input."**
The five canonical conditions.
1. **Canny** — the edge map from a Canny edge detector. Preserves the silhouette of the input.
2. **Depth** — a depth map from MiDaS or ZoeDepth. Easy to receive a depth map from a 3D render.
3. **OpenPose** — human skeleton and pose. Transfers a dance, yoga, or workout pose exactly.
4. **Tile** — tiles the same image and adds detail. Core of 4K upscaling.
5. **IP-Adapter** — absorbs style from an input image. Image prompts instead of text prompts.
During 2024 to 2025, FLUX-compatible ControlNet and SD3.5-compatible ControlNet shipped in turn. FLUX even absorbed parts of ControlNet into the model itself with FLUX.1 Tools (Canny/Depth/Redux). The task **"follow this one input image precisely"** was nearly impossible before ControlNet, and it remains the workflow center now.
Chapter 13 · LoRA Fine-Tuning — How to Bake Your Own Character into the Model
**LoRA** (Low-Rank Adaptation) is a technique that fine-tunes only part of a large model as low-rank matrices. Instead of retraining the full diffusion model (several GB), you train one LoRA adapter (typically 50 MB to 200 MB). The upshot: you can **bake your own character, your own style, your own product** into the model.
Training tools.
- **kohya_ss** — the standard LoRA training GUI. Supports SD 1.5, SDXL, SD3, and FLUX. Repository `bmaltais/kohya_ss`.
- **OneTrainer** — alternative to kohya_ss with a more intuitive UI.
- **AI-Toolkit (ostris)** — specialized for FLUX LoRA. Quickly became the standard tool of the FLUX era.
Data prep.
1. Collect 10 to 50 images of the target.
2. Place a caption (`txt` file) next to each image. BLIP auto-caption or hand-written.
3. Unify a trigger token (such as `sks_dog` or `myface`) at the start of each caption.
Core training parameters.
- **rank** — the LoRA dimension. Usually `4` to `64`. Higher means more expressiveness and bigger files.
- **steps** — about 1000 to 3000. Too long means overfit.
- **learning_rate** — around `1e-4`.
Trained adapters end up on Civitai (`civitai.com`). By 2026 Civitai hosts more than 300k LoRAs. Stacking two or three LoRAs on top of the same SDXL or FLUX base and blending weights (the `LoRA<rank>` notation tunes contribution) is now everyday workflow.
Chapter 14 · Inpainting / Outpainting Workflows
Repainting part of an image (inpainting) and extending beyond the canvas (outpainting) are the most-used editing workflows in 2026.
**Inpainting scenarios**.
1. Change only the clothes on a portrait — mask the clothing region and supply a new prompt.
2. Remove a person from a landscape — mask the person, prompt to match the background.
3. Replace only the background of a product photo — invert the product mask, prompt the new background.
4. Add text — mask an empty area, prompt the text such as "WELCOME."
**Outpainting scenarios**.
1. Portrait crop -> wide landscape banner.
2. 4:3 -> 16:9.
3. Same person and composition with the camera zoomed out.
Tool mapping.
- DALL·E 4: paint a mask in the ChatGPT canvas.
- FLUX.1 Fill: mask node in ComfyUI.
- Photoshop Generative Fill: Firefly backend.
- Midjourney: Vary Region (inpaint), Zoom Out / Pan (outpaint).
- Stable Diffusion 3.5: inpaint tab in A1111/Forge.
Quality hinges on **mask-edge feathering** and **context padding** (how much around the mask the model sees).
Chapter 15 · Upscalers — 4x-UltraSharp, ESRGAN, Aura SR
Stretching a generated 1024 image to 4K is a separate model's job. Standard candidates.
**4x-UltraSharp** — the most-downloaded ESRGAN-based upscaler on Civitai. Routine for 4x upscaling of SD 1.5 and SDXL output.
**Real-ESRGAN** — the original ESRGAN variant tuned for real photography. Repository `xinntao/Real-ESRGAN`.
**ESPCN** — fast but lower quality. Used for realtime video.
**Aura SR** — next-generation SR model released by fal.ai in 2024. Natural even at very large factors (8x, 16x).
**SUPIR** — diffusion-based SR. Very slow but dominant in quality. Best for 4K print of human faces.
Typical workflow is **generate (1024) -> upscale (2048 to 4096) -> detailer (face/hands)**. Wired together in a ComfyUI graph.
Chapter 16 · Image-to-Video Bridges — Kling 1.5, Hailuo
By 2026 image generation and video generation increasingly live in the same workflow. The pattern: make the first frame as an image, hand it to a video model.
- **Kling 1.5** (Kuaishou) — extends one image to a 5 to 10 second clip. Strong motion consistency.
- **Hailuo** (MiniMax) — Chinese model in the same category. Competitive pricing.
- **Runway Gen-3 / Gen-4** — Image-to-Video mode. Camera motion is given in natural language.
- **Sora 2** (OpenAI) — a separate article's topic, but it accepts image input and extends to video.
- **Veo 2 / 3** (Google) — accepts an Imagen-generated image as the opening frame.
A canonical pipeline.
1. Generate the first 4K frame with FLUX 1.1 Pro Ultra.
2. Pass that frame to Kling 1.5 with a `motion_prompt` like "camera zooms in slowly."
3. Run the resulting clip through Topaz Video AI for 60fps interpolation and 4K upscaling.
Chapter 17 · C2PA + Watermarks — The Standard for Proving Provenance
The technical standard for proving the provenance of a generated image is **C2PA** (Coalition for Content Provenance and Authenticity). Adobe, Microsoft, Intel, BBC, OpenAI, and others are members. Tamper-resistant metadata is embedded into the image describing where it was made and which model or tool made it.
By 2026 the tools that attach C2PA automatically include.
- DALL·E 4 and gpt-image-1 — attached by default.
- Adobe Firefly 3 — attached by default.
- Photoshop 25/26 — also records the edit history.
- BBC, NYT (subset) — C2PA verification on article photographs.
Separately, an **invisible watermark** standard exists.
- **SynthID** (Google DeepMind) — embedded at pixel level into Imagen output. Invisible to humans, detectable only by the SynthID verifier.
- **Stable Signature** (Meta) — watermark for SD-model output. Fine noise patterns.
Legally and policy-wise, the EU AI Act imposes labeling obligations on synthetic images (rolled out in phases from 2026). Korea and Japan are in the guideline stage.
Chapter 18 · Legal Front — Andersen, Getty, NYT
AI image generation has lived in the middle of major legal disputes since 2023. The 2026 picture of major cases.
**Andersen v Stability AI** (Northern District of California) — illustrators filed a class action protesting that their art ended up in LAION training data. During 2024 and 2025 some claims moved into substantive proceedings. Likely to become the first U.S. precedent that draws the line between "output infringement" and "infringement of the model weights themselves."
**Getty Images v Stability AI** (UK and U.S. in parallel) — Getty pressed claims for damages with evidence that its watermarked images were trained on (the watermark persisted in SD output). The UK side moved into substantive trial first, and 2025 reporting indicated some portions of the ruling leaned toward recognizing the training itself as infringement.
**New York Times v OpenAI** — text-centric rather than images, but turns on the same "is training on public internet data fair use" question, and every generative AI camp is watching. Filed at the end of 2024 and in discovery as of 2026.
**Individual artists vs Midjourney and Runway** — individual claims accumulate.
Issues at a glance.
- **Is training fair use?** — the central U.S.-law question. The four-factor transformative-use test.
- **Do rights in the original remain in the output?** — established doctrine is that style is not copyrightable, but the identifiability problem in training data is separate.
- **Are the model weights themselves infringing?** — a question never tested before.
- **Is the user liable?** — when a user explicitly prompts for infringement.
Until these fronts settle, enterprises prefer **license-safe models** (Firefly, Imagen Vertex enterprise tier, LoRAs trained on their own data). Adobe's indemnification offer is powerful for that reason.
Chapter 19 · Korean Services — Naver Hyperclova X Image, NCsoft VARCO, Kakao Karlo
The Korean side of the image-gen landscape.
**Naver Hyperclova X Image** — the image-output model inside Naver's Hyperclova X line of large models. Strong on Korean-language prompts integrated with search, shopping, and blogs. Accessed via CLOVA Studio API.
**NCsoft VARCO** — NCsoft's large LLM and image lineup. Text, image, and audio are unified in VARCO Studio. As befits a game company, it is strong on character, illustration, and lore-creation scenarios.
**Kakao Karlo** — Kakao Brain's open-source image model. After its 2023 release, Korean-prompt support was a strong point, and follow-up versions ship integrated into Kakao services.
**LG AI Research Exaone Vision** — multimodal image input and output in LG's Exaone line.
What these share is (1) **Korean-prompt fidelity**, (2) training on **K-content style** (K-pop, K-drama, webtoon aesthetics), (3) **friendliness with domestic cloud and compliance**. They are the priority candidates in public, finance, and telecom where multinational models are hard to adopt.
Chapter 20 · Japanese Services — Rinna AI Lab, NTT-AT, Picsart Japan
The Japanese side.
**Rinna AI Lab** — leader of Japanese-language LLMs. Lineup of Japanese text-to-image and image-to-text models. Strong on character, animation, and manga tones.
**NTT-AT generative tools** — NTT Group's enterprise generative AI services. Designed to integrate with Japanese corporate intranets.
**Picsart Japan** — Picsart's Japanese subsidiary expanding with localized UI and market-specific features.
**Sakana AI** — Tokyo-based startup. Less standalone image generation, more meta techniques like model merging and evolutionary training. Has released several Japanese-language-specialized SD merge models.
**Stable Diffusion Japanese merge models** — series like Animagine, Pony Diffusion (separate), and Japanese merge models dominate the illustration and anime tone on the SDXL base. Many are hosted on Civitai.
The Japanese market exhibits (1) **a very high bar for anime and manga aesthetics**, beyond what generic models supply, (2) **strong user awareness about copyright** which boosts the popularity of license-safe training, (3) **decisive demand for local Japanese-language prompts**. Japanese merge models, LoRAs, and Japanese-caption datasets have congealed into their own ecosystem.
Chapter 21 · Selection Matrix — What to Use When
All the tools surveyed so far, in one table.
| Situation | First choice | Second choice | Note |
|-----------|------|------|------|
| Aesthetic illustration series | Midjourney v7 | Leonardo | sref/cref |
| Photoreal advertising | FLUX 1.1 Pro Ultra | Imagen 3 | direct 4K |
| Text inside a poster | Ideogram 3.0 | Recraft V3 | Magic Prompt |
| Logo / vector design | Recraft V3 | Adobe Illustrator AI | SVG output |
| Enterprise license safety | Adobe Firefly 3 | Imagen Vertex | indemnification |
| Photoreal on own GPU | FLUX.1 [dev] | SD 3.5 Large | 16GB+ VRAM |
| Illustration on own GPU | Pony / Animagine | SD 3.5 Large | SDXL base |
| Conversational inside ChatGPT | gpt-image-1 / DALL·E 4 | - | inpainting |
| Compositing inside Photoshop | Firefly Generative Fill | - | C2PA |
| Character-consistent series | Midjourney cref | LoRA(FLUX) | - |
| Inpaint / outpaint | FLUX.1 Fill | DALL·E 4 | - |
| Text-to-video bridge | FLUX -> Kling 1.5 | Imagen -> Veo 2 | - |
| Realtime canvas | Krea AI | Leonardo Realtime | LCM |
| Korean-prompt first | Naver Hyperclova X | Imagen 3 | - |
| Japanese / anime | Rinna / Animagine | NovelAI | - |
Decision tree.
1. **Does the output contain precise text?** -> If yes, Ideogram or Recraft. If no, continue.
2. **Should it look like a photograph?** -> Photoreal: FLUX/Imagen/Reve. Illustration: Midjourney/Leonardo.
3. **Will you run it on your own GPU?** -> Yes: FLUX.1 [dev] or SD 3.5. No: API.
4. **Do you need legal indemnification?** -> Yes: Adobe Firefly.
5. **Will you bake in your own character / product?** -> LoRA training (kohya_ss / ai-toolkit).
6. **Do you need realtime interaction?** -> Krea AI / Leonardo Realtime.
Chapter 22 · Conclusion — One Map, Five Branches
Spring 2026, the AI image-generation landscape compressed to one paragraph:
**Aesthetic taste goes to Midjourney v7**, **photoreal realism goes to FLUX 1.1 Pro Ultra and Imagen 3**, **text and vector go to Ideogram and Recraft**, **editing and compositing go to DALL·E 4, FLUX Tools, and Photoshop Generative Fill**, **the base on your own GPU is FLUX.1 [dev] and SD 3.5 Large**, **legal safety is Adobe Firefly 3**, **realtime is Krea AI**. ComfyUI node graphs tie it together, ControlNet, LoRA, and IPAdapter are the building blocks, Aura SR and 4x-UltraSharp finish the job, and C2PA proves provenance.
The "one model" era of two years ago is over. The 2026 answer is **"which branch are you on"**, and drawing that branch correctly is half the workflow.
References
- Midjourney docs: https://docs.midjourney.com/
- Midjourney web app: https://alpha.midjourney.com/
- OpenAI image guide: https://platform.openai.com/docs/guides/images
- OpenAI gpt-image-1 announcement: https://openai.com/index/introducing-4o-image-generation/
- Google Imagen 3 (Vertex AI): https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
- Google ImageFX (labs): https://labs.google/fx/tools/image-fx
- Google Whisk: https://labs.google/whisk
- Black Forest Labs FLUX: https://blackforestlabs.ai/
- FLUX on HuggingFace: https://huggingface.co/black-forest-labs
- FLUX.1 Tools: https://blackforestlabs.ai/flux-1-tools/
- Stable Diffusion 3.5 (Stability AI): https://stability.ai/news/introducing-stable-diffusion-3-5
- Stability AI license: https://stability.ai/community-license-agreement
- Ideogram: https://ideogram.ai/
- Recraft: https://www.recraft.ai/
- Adobe Firefly: https://www.adobe.com/products/firefly.html
- Reve Image: https://reve.art/
- Krea AI: https://www.krea.ai/
- Leonardo AI: https://leonardo.ai/
- Playground AI: https://playground.com/
- ComfyUI: https://github.com/comfyanonymous/ComfyUI
- AUTOMATIC1111: https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Forge (lllyasviel): https://github.com/lllyasviel/stable-diffusion-webui-forge
- InvokeAI: https://github.com/invoke-ai/InvokeAI
- Fooocus: https://github.com/lllyasviel/Fooocus
- ControlNet paper (Lvmin Zhang): https://arxiv.org/abs/2302.05543
- kohya_ss LoRA training: https://github.com/bmaltais/kohya_ss
- AI-Toolkit (ostris): https://github.com/ostris/ai-toolkit
- Civitai (LoRA hub): https://civitai.com/
- 4x-UltraSharp: https://openmodeldb.info/models/4x-UltraSharp
- Real-ESRGAN: https://github.com/xinntao/Real-ESRGAN
- Aura SR (fal): https://fal.ai/models/fal-ai/aura-sr
- C2PA standard: https://c2pa.org/
- SynthID (DeepMind): https://deepmind.google/technologies/synthid/
- Andersen v Stability AI (NDCA): https://www.courtlistener.com/docket/66732129/andersen-v-stability-ai-ltd/
- Getty Images v Stability AI: https://www.gettyimages.com/news/press-releases/
- NYT v OpenAI: https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
- Artificial Analysis image benchmark: https://artificialanalysis.ai/text-to-image
- Naver CLOVA Studio: https://www.ncloud.com/product/aiService/clovaStudio
- Kakao Brain Karlo: https://github.com/kakaobrain/karlo
- Sakana AI: https://sakana.ai/
현재 단락 (1/306)
In spring 2024, when someone said "I want to make a picture with AI," we reached for one of three mo...