Skip to content

필사 모드: AI Music Generation 2026 Deep Dive — Suno v4 · Udio · Stable Audio 2 · MusicGen · AIVA · Mubert · Soundraw

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Introduction — Why 2026 is an Inflection Point for AI Music

When Suno v3 and the Udio beta started producing two-minute vocal tracks from a single text prompt in spring 2024, the music industry reacted seriously for the first time. The RIAA filed copyright infringement lawsuits against both companies that June, and at almost the same time Stability AI released Stable Audio 2.0, an open model capable of generating three-minute full tracks. By late 2025, Suno had struck a partnership with Microsoft Copilot exposing music generation to mainstream consumers, and by spring 2026 Suno v4 / v4.5 had shipped Cover, Stems, Remaster, Personas, and Lyrics, cementing its position as category leader.

The landscape is far from monolithic, however. After settlements with Universal and Warner, Udio maintains a distinct aesthetic identity; Meta's MusicGen continues to offer an open option that researchers can fine-tune freely through the `audiocraft` library. AIVA specializes in orchestral composition; Mubert has captured the BGM market with its API and generative streaming; and Soundraw, headquartered in Tokyo, offers structure-controlled royalty-free music tailored to the Japanese content market. Alongside them, Adobe Project Music GenAI Control, Google MusicFX DJ, and Riffusion (Beat-N) are introducing interactive music generation as a new use model.

This article organizes every axis a producer, video creator, developer, or music enthusiast needs to know to decide "how to use AI music generation" in 2026. It covers model architectures, feature differences, licensing, pricing, workflows, legal risk, and Korean/Japanese context as concretely as possible.

1. The 2026 AI Music Generation Map — Four Categories

The full landscape of AI music tools becomes clear when divided into four categories.

| Category | Core use case | Representative products |

|---|---|---|

| Full-song generation (with vocals) | Text → full track with vocals + instrumental | Suno v4.5, Udio v2, Riffusion |

| Instrumental / sound | BGM, game/video music, sound effects | Stable Audio 2.0, Mubert, Soundraw, AIVA |

| Open / research models | Self-hosted, fine-tunable | MusicGen 3.3B, AudioLM, NaturalSpeech 3, OpenMusic |

| Interactive / DJ | Real-time control, live | MusicFX DJ, Lyria RealTime, Project Music GenAI Control |

These four categories serve different use models. Full-song generators are "tools that turn zero into one"; instrumental/sound tools are "parts suppliers for content production"; open models are "the foundation for research and customization"; and interactive tools are "a new use model for live consumption." True competitive advantage in 2026 comes from the ability to pick the right tool from each category and weave them into a coherent workflow.

2. Suno v4 / v4.5 — Category Leader and the Microsoft Copilot Partnership

Suno is a Cambridge, Massachusetts company founded in 2022. After releasing v3 in March 2024 it has been the fastest-evolving text-to-song tool, and as of spring 2026 it leads the market in share. v4.0 shipped in December 2025, with v4.5 following in April 2026 with additional features.

2.1 The Microsoft Copilot Partnership

In November 2025, Suno announced an official partnership with Microsoft Copilot. Copilot users can request music generation in natural language, and the Suno API processes the request and returns the result. This is the largest distribution channel exposing AI music to mainstream consumers, and it caused an explosive short-term increase in Suno's free-plan user base.

2.2 Key v4.5 Features

- **Extended song length**: 4 minutes default, up to 8 minutes via Extend. Roughly four times the 1:30 limit of the v3 era in 2025.

- **Cover**: Keep the melody and chord structure of an existing track while generating a new vocal timbre, style, and lyrics.

- **Stems**: Download separated tracks for vocals, bass, drums, melody, and other instruments. You can pull them into a DAW for post-production.

- **Remaster**: Regenerate an existing output at higher fidelity. Adjusts loudness, bass response, and vocal clarity.

- **Lyrics**: Built-in lyric generator. Topic, mood, and verse structure can be specified.

- **Personas**: Personas trained on the vocal timbre and expressive style of specific artists (based on licensed catalogs).

2.3 Pricing and Licensing

| Plan | Price | Monthly credits | Commercial use |

|---|---|---|---|

| Free | 0 USD | 50 credits/day (~10 songs) | Not allowed |

| Pro | About 10 USD/month | 2,500 credits/month (~500 songs) | Allowed |

| Premier | About 30 USD/month | 10,000 credits/month (~2,000 songs) | Allowed |

Pro and above grant commercial use rights on outputs. With the RIAA lawsuit still ongoing, however, "100 percent safe" is difficult to advertise.

2.4 Strengths and Weaknesses

- Strengths: The most natural vocals in English-language pop, rock, EDM, and folk. The UI/UX is intuitive and the entry barrier is low.

- Weaknesses: Korean/Japanese lyrics still produce awkward pronunciation and prosody. Complex genres like jazz improvisation and classical orchestration remain weak. Coherence breaks down beyond four minutes.

3. Udio v1.5 / v2 — Aesthetic Differentiation by Uncharted Labs

Udio is the product of Uncharted Labs, founded in December 2023 by researchers from Google DeepMind. CEO David Ding leads the team; Andreessen Horowitz led the seed round (about 10M USD in April 2024). Instagram co-founder Mike Krieger, will.i.am, and Common are reportedly among the music-industry investors.

3.1 v2 Features

- **Full-song length**: 1:30 generation by default, extendable to a maximum of 15 minutes via Extend. The ability to produce a longer single output than Suno is a key differentiator.

- **Audio Inpainting**: Regenerate a specific section of an existing track. You can swap out a single line of vocals or a single bar of drums.

- **Stem Separation**: Split vocals from instruments. WAV downloads are DAW-compatible.

- **Genre / Lyrics Style Tags**: Finer genre-tag control with phrases like `style of jazz` or `style of bossa nova`.

3.2 Licensing Settlements

Universal Music Group settled with Udio on October 29, 2025, and Warner followed on November 25. Kobalt and Merlin Network subsequently signed licensing agreements as well. As of May 2026 the only major actively litigating against Udio is Sony. As part of the settlements, Universal and Warner are reportedly slated to participate in launching a joint AI music platform.

3.3 Aesthetic Characteristics

Where Suno is generally considered more "pop-leaning and polished," Udio tends to feel more like "a producer-edited track." It scores better in hip-hop, R&B, Latin, and electronic music. Its vocals are slightly rougher than Suno's, which becomes an asset in certain genres.

4. Stable Audio 2.0 — Stability AI's 3-Minute Track Model

Stability AI released Stable Audio 2.0 in April 2024. The audio line-up from the company best known for Stable Diffusion, it offers full-track (up to three minutes, 44.1kHz stereo) generation, audio-to-audio transformation, and the ARC (Audio Research Collective) license model.

4.1 Model Architecture

Stable Audio 2.0 is a latent diffusion model. It descends from image diffusion but applies to the audio domain. The model is built from a text encoder, an autoencoder (which compresses audio into a latent space), and a diffusion transformer. The training data is 800K songs plus metadata licensed from AudioSparx.

4.2 Core Features

- **Text-to-Audio**: Generate a three-minute full track from a text prompt.

- **Audio-to-Audio**: Transform uploaded audio with a text prompt. For example, upload a vocal line, send it to the latent space, and re-synthesize it as a new genre.

- **Sound Effects**: Generate non-musical sounds (rain, footsteps, explosions, etc.). Useful in game sound design.

- **Stable Audio Open**: Open-source version. Specialized for short effects and loops at the 4096-sample level.

4.3 Licensing and Pricing

- **Personal use**: Stable Audio free tier — up to 20 songs/month.

- **Commercial use**: Pro plan about 12 USD/month — 500 songs/month. Commercial rights on outputs.

- **API**: Separately priced — usage-based billing at roughly 0.05 USD per second.

- **ARC license**: Audio Research Collective — a license model that shares revenue with training-data providers.

5. Meta MusicGen 3.3B — The Open-Source Standard

MusicGen, released by Meta (formerly Facebook) AI Research in June 2023, has become the open-source standard for music generation. A 3.3B-parameter version was added in 2024, alongside MusicGen-Melody, which supports melody-conditioned generation, and a stereo variant, MusicGen-Stereo.

5.1 Model Lineup

| Model | Parameters | Characteristics | Recommended GPU |

|---|---|---|---|

| musicgen-small | 300M | Fastest, lowest quality | RTX 3060 12GB |

| musicgen-medium | 1.5B | Balanced | RTX 4070 |

| musicgen-large | 3.3B | Best quality | RTX 4090 24GB |

| musicgen-melody | 1.5B | Melody-conditioned | RTX 4070 |

| musicgen-stereo | 1.5B / 3.3B | Stereo output | RTX 4080 |

5.2 Usage

It is accessed via Meta's `audiocraft` library. After installation you can call it from a Python script or use the interface integrated into Hugging Face Transformers.

Using MusicGen via audiocraft

from audiocraft.models import MusicGen

from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('facebook/musicgen-large')

model.set_generation_params(duration=30)

descriptions = ['80s pop track with bassy drums and synth']

wav = model.generate(descriptions)

for idx, one_wav in enumerate(wav):

audio_write(f'{idx}', one_wav.cpu(), model.sample_rate)

5.3 License — CC-BY-NC

MusicGen's model weights are released under CC-BY-NC 4.0 (non-commercial use). Commercial use of outputs therefore requires separate licensing. The training data is reportedly approximately 20,000 hours of licensed music (ShutterStock, Pond5, and similar sources).

6. AIVA — Strength in Orchestral Composition

AIVA (Artificial Intelligence Virtual Artist) is a Luxembourg company founded in 2016 and one of the oldest AI music generators. It was recognized as the first AI composer registered with SACEM. As of 2026 it specializes in orchestral, cinematic, and game music composition.

6.1 Features

- **Style selection**: 30+ presets including Cinematic, Modern Cinematic, Tango, Sea Shanty, Symphonic, Electronic, Pop, Rock, and Folk.

- **MIDI editing**: Edit the MIDI of generated pieces directly and re-render.

- **Influence Mode**: Generate new music inspired by uploaded music (MIDI or audio).

- **Step Time / Pencil Tool**: Draw chord progressions and melodies directly while AIVA fills in the rest.

6.2 Pricing and Licensing

| Plan | Price | Monthly generations | Commercial use |

|---|---|---|---|

| Free | 0 USD | 3 songs/month (MP3 only) | Not allowed (personal non-commercial) |

| Standard | About 15 USD/month | 15 songs/month (MP3, MIDI) | Allowed (AIVA credit required) |

| Pro | About 49 USD/month | 300 songs/month (all formats) | Full ownership |

The Pro plan grants full copyright ownership of outputs (royalty-free). That is the reason it is frequently used in films, advertising, and games.

7. Mubert — API and Generative Streaming

Mubert started in Russia in 2016 and relocated its headquarters to the United States. By 2026 it has carved out a niche in the API and streaming-music market. While other tools focus on "generate a single song," Mubert created a different use model: "music that streams infinitely."

7.1 Use Models

- **Mubert Studio**: Generate tracks from a text prompt (similar to the other tools).

- **Mubert Render**: Auto-generate BGM matched to a video's length.

- **Mubert API**: Integrate into apps, games, or the web. Infinite BGM streams personalized by user, mood, or context.

- **Mubert Streaming**: Spotify-style live streams. The AI keeps producing new tracks endlessly.

7.2 Pricing

| Plan | Price | Use case |

|---|---|---|

| Free | 0 USD | 25 songs/month, non-commercial |

| Creator | About 14 USD/month | Content creators, unlimited downloads |

| Pro | About 39 USD/month | Commercial use, longer tracks |

| Business / API | Custom quote | API integration, white label |

Mubert primarily targets the NFT music, metaverse BGM, and dynamic game soundtrack markets.

8. Soundraw — Structure-Controlled Music Generation from Tokyo

Soundraw is a Tokyo-based company whose AI music generator reflects the context of the Japanese music-content industry. Founded in 2020, its biggest differentiator versus other tools is "structural control." Users can directly adjust song length, the placement of intro/verse/chorus/bridge/outro, and the energy of each section.

8.1 The Structural Control Interface

A generated song is shown on a timeline, and users adjust each section's energy with clicks. You can make the chorus more explosive or end the outro abruptly instead of fading. This is especially useful for video-editing use cases where the music must be cut precisely to the visual edit.

8.2 Pricing and Licensing

| Plan | Price | Downloads | Commercial use |

|---|---|---|---|

| Free | 0 USD | Preview only | Not allowed |

| Creator | About 17 USD/month | Unlimited | Allowed (perpetual) |

| Artist | About 30 USD/month | Unlimited | Allowed + distribution rights |

Soundraw offers a perpetual royalty-free license. Once downloaded, a track can be used in perpetuity even after canceling the subscription. It is particularly popular with Japanese YouTube creators and video production studios.

9. Boomy / Anthemic / Riffusion — Smaller Entrants

9.1 Boomy

Boomy is a California company founded in 2018 that grew on the concept of "make a song in 30 seconds and distribute it to Spotify." In 2022 it briefly made headlines as the source of 10 percent of new uploads to Spotify. After Spotify removed a large number of Boomy tracks on suspicion of artificial streaming in spring 2023 it pulled back somewhat, but as of 2026 it is still widely used as a free, easy-entry tool.

9.2 Anthemic

Anthemic is a relatively new entrant launched in 2025 that focuses on vocal and lyric integration. It is a small team, but stands out in vocal emoting, and its use model of "turning my humming into a full song" has generated attention.

9.3 Riffusion (Beat-N)

Riffusion, released as an open-source project in December 2022, started simply — convert audio into a spectrogram (a frequency image), generate the image with Stable Diffusion, then convert it back to audio. It incorporated in 2024 as ProducerAI, and in February 2026 Google acquired it and integrated it into Lyria 3. The open-source demo under the Riffusion name remains accessible on GitHub.

10. Google MusicLM and MusicFX DJ — Interactive Music

Google DeepMind's MusicLM was published as a paper in January 2023 and released to a limited audience via AI Test Kitchen in May 2023. It later evolved under the name Lyria, and as of 2026 the lineup looks like this.

10.1 The MusicLM Lineage

- **MusicLM** (2023): The first text-to-music model. AudioLM-based.

- **MusicLM-Hum**: Generates a full track from a user-hummed melody.

- **Lyria 1/2/3**: Progressively longer outputs and higher quality. 48kHz stereo.

- **Lyria RealTime**: Real-time streaming music control.

10.2 MusicFX DJ

Released by Google Labs in December 2024, MusicFX DJ introduces a new use model for interactive music generation. The user has several prompt sliders, and as their values are adjusted in real time the music instantly transforms. Raising the "Jazz" slider strengthens jazz elements; lowering "Drums" makes drums disappear. It applies the way a DJ mixes tracks in a live set to AI music.

10.3 Lyria RealTime

Lyria RealTime deserves a separate mention. Instead of "generate a single song," it is a model for "controlling streaming audio live" and is accessed via the Gemini API. It can produce endless music while adjusting style, tempo, and mood in real time, with live streaming, game BGM, and interactive installations as the primary use cases.

11. Adobe Project Music GenAI Control

Adobe unveiled Project Music GenAI Control as a research prototype in February 2024. It is a joint research project between Adobe Research, UC Santa Barbara, and Carnegie Mellon University. The core idea is "edit audio like text."

11.1 Core Features (Prototype)

- Generate songs from a text prompt.

- Directly adjust intensity, structure, and repetition patterns of generated music.

- Integrated post-processing such as beat matching, audio extension, and transformation.

- Planned integration with Adobe products like Premiere Pro and After Effects.

As of May 2026 it has not been released as an official product, but it is expected to be integrated as the core technology behind Adobe Firefly's music-generation features.

12. Open Models — AudioLM / AudioCraft / NaturalSpeech 3 / OpenMusic / F5-TTS

On the research side, the following models are the standards as of 2026.

12.1 AudioLM (Google)

Released in September 2022, AudioLM was one of the first models to treat audio like an LLM. It tokenizes audio with the SoundStream codec and uses a transformer to predict the next token. This architecture later became the foundation of MusicLM and AudioCraft.

12.2 AudioCraft (Meta)

Released by Meta in August 2023, AudioCraft is a unified framework for music and sound generation. It bundles MusicGen, AudioGen, and EnCodec together. Built on PyTorch, it is freely accessible on GitHub.

12.3 NaturalSpeech 3 (Microsoft)

NaturalSpeech 3 is a speech synthesis model, but it is also used for music generation. Its Factorized Codec structure decomposes voice into timbre, prosody, and pronunciation for training. It is directly applied to Singing Voice Synthesis.

12.4 OpenMusic

OpenMusic is an open-source text-to-music model released in 2024. It descends from MusicGen but pursues similar quality with a smaller model. A key feature is the ability to run inference even on a CPU.

12.5 F5-TTS

F5-TTS is a voice synthesis model released in October 2024 capable of "cloning an arbitrary timbre from a five-second voice sample." Combined with a music generator it allows precise control over vocal timbre. License is CC-BY-NC.

13. Lyric Generation — Suno Lyrics vs Udio vs ChatGPT

AI music tools usually embed their own lyric generators, but the pattern of writing lyrics directly in ChatGPT or Claude and feeding them to a music tool is also common.

13.1 Lyric Generation Options Compared

| Option | Strengths | Weaknesses |

|---|---|---|

| Suno Lyrics | Aligned to song structure (verse/chorus/bridge) | Vocal prosody handled by the model itself |

| Udio Lyrics | More flexible lyric styles | Slightly less consistent prosody |

| ChatGPT-4 / GPT-5 | Strongest general songwriting | Unaware of musical prosody |

| Claude 4 / Opus 4 | Poetic, metaphorical expression | Same weakness |

| Human lyricist | Emotional depth | Time/cost |

A practical workflow typically looks like this:

1. Generate a lyric draft in ChatGPT (specifying topic, mood, and prosody patterns).

2. A human refines to fit vocal prosody.

3. Paste the lyrics into Suno/Udio to generate the music.

4. If unsatisfied, return to step 1.

13.2 The Specificity of Korean-Language Lyrics

Korean lyrics differ from English in syllable structure. In English, melodic stresses fall on stressed syllables, but in Korean syllables are evenly weighted. Models trained on English data therefore pronounce Korean awkwardly. As of 2026, Suno v4.5 imitates Korean syllabic pronunciation to a degree, but handling of final consonants and liaison remains awkward.

14. Stem Separation — Demucs / Spleeter / Stable Audio Tools

Tools that separate vocals, drums, bass, and melody from songs made by AI music generators or from existing tracks.

14.1 Demucs v4 (Meta)

Demucs is Meta's open-source stem-separation model, the most widely used as of 2026. v4 is Hybrid Transformer Demucs (HT-Demucs), combining transformers and convolutions. Both 4-stem (vocals, drums, bass, others) and 6-stem (+ piano, guitar) models exist.

Demucs install and use

pip install demucs

4-stem separation

demucs my_song.mp3

6-stem separation (piano and guitar split out)

demucs --six-stems my_song.mp3

14.2 Spleeter (Deezer)

Released by Deezer in 2019, Spleeter's strength is speed. It offers 2-stem (vocals/accompaniment), 4-stem, and 5-stem models. The quality is slightly lower than Demucs but it processes in real time on a CPU.

14.3 Stable Audio Tools

A set of open-source tools released by Stability AI. It provides stem separation, track extension, and audio transformation using the latent-space model from Stable Audio.

14.4 Commercial Options — LALAL.AI / Moises AI / UVR

- **LALAL.AI**: One pack at around 10 USD separates a single track. Up to 10 stems including vocals, drums, bass, guitar, piano, and synths.

- **Moises AI**: About 4 USD/month for unlimited separation with a mobile app.

- **UVR (Ultimate Vocal Remover)**: Open-source GUI tool. Integrates multiple models.

15. MIDI Generation — Magenta and the Anticipatory Music Transformer

There are also tools that generate MIDI (score data) rather than audio. The advantage is that composers can edit them directly in a DAW.

15.1 Magenta (Google)

Magenta is a Google Brain research project on music and machine learning that began in 2016. As of 2026 it ships Magenta Studio, an Ableton Live plugin. Its capabilities include:

- **Continue**: Automatically continue an input MIDI clip.

- **Generate**: Create a new melody on an empty clip.

- **Interpolate**: Morph between two MIDI clips.

- **Drumify**: Add drum patterns to an input rhythm.

15.2 Anticipatory Music Transformer (Stanford)

Released by Stanford CRFM in 2024, this model trains MIDI sequences with a transformer. Its hallmark is freedom in "conditional generation": the user can "pin" specific notes to future time points and the model naturally progresses toward them.

16. Music-Targeted Voice Cloning — RVC / So-Vits-SVC

Tools that replace a vocal track's timbre with someone else's. Used for covers and virtual singers.

16.1 RVC (Retrieval-based Voice Conversion)

RVC is an open-source voice-conversion model released in 2023. With about ten minutes of vocal samples as training data you can sing other songs in that timbre. It exploded in popularity in the Korean and Japanese V-tuber communities.

16.2 So-Vits-SVC

So-Vits-SVC stands for Soft-VITS Singing Voice Conversion and is the predecessor to RVC. It supports pitch adjustment and vibrato control in addition to timbre conversion.

16.3 Legal Gray Area

Cloning a famous singer's timbre with RVC or So-Vits-SVC is a legal gray area. The 2023 cover "Heart on My Sleeve" imitating Drake and The Weeknd sparked legislative debate in the US and UK around voice rights and publicity rights.

17. Legal Issues — RIAA v Suno/Udio

17.1 Background of the Suit

On June 24, 2024, the Recording Industry Association of America (RIAA), on behalf of Universal/Warner/Sony, filed copyright infringement lawsuits against Suno (US District Court for Massachusetts) and Udio (US District Court for the Southern District of New York). The core claim: "the major labels' copyrighted records were used as training data without permission."

17.2 Both Sides' Arguments

- **RIAA side**: Suno and Udio scraped major label catalogs without permission to train their models. As evidence, they submitted examples of the models nearly reproducing specific songs.

- **Suno/Udio side**: The training acts qualify as transformative fair use. They cite precedents such as Google Books.

17.3 The Settlement Wave (2025-2026)

- October 29, 2025: Universal Music Group settles with Udio.

- November 25, 2025: Warner Music Group settles with Udio.

- January 2026: Kobalt Music Group settles with Udio.

- March 2026: Merlin Network (the indie label coalition) settles with both Suno and Udio.

As of May 2026, the Suno suit is still ongoing, with a summary judgment hearing scheduled for July 2026.

17.4 The Sony Music Data Scraping Incident

In May 2024, Sony Music sent opt-out notices to 700+ AI companies stating "do not use our catalog as training data." Those notices prompted the entire AI music industry to re-examine training-data provenance.

17.5 The Limits of Fair Use

The transformative fair use principle in US copyright law applies when something is created with "a different purpose or expression from the original." The fact that AI music generators can nearly reproduce specific songs makes applying this principle harder. As of 2026, US courts have no consistent precedent, and the industry has chosen the path of licensing settlements.

18. Korean Services — SKT MetaSpace Music and Naver Clova Music

18.1 SK Telecom MetaSpace Music

SK Telecom released the MetaSpace Music beta in 2024 as part of its metaverse strategy. A text-to-music model with strength in Korean-language lyric handling, it is being used as of 2026 inside ifland (SKT's metaverse platform) for user-generated music.

18.2 Naver Clova Music

One of Naver's Clova AI lineup, it specializes in BGM generation and vocal synthesis. The Korean vocal synthesis model trained on Naver's own singer vocal data is its key differentiator.

18.3 AI Adoption in the K-pop Industry

SM Entertainment unveiled virtual artist nævis in 2024; HYBE invested in AI music tool development through its 2024 US subsidiary MIN Music; JYP has publicly stated that it uses AI for vocal guides and demo creation.

19. Japanese Services — Sound Catalyst and Vocaloid AI

19.1 NTT Sound Catalyst

Part of the NTT Group's music AI lineup, by 2025 it specializes in real-time music generation for live performance. A demo using crowd reaction at Tokyo Dome as input to drive BGM dynamically generated attention.

19.2 Yamaha Vocaloid AI Yui / Aoi

Yamaha released the new libraries "Yui" and "Aoi" in Vocaloid 6. Both are AI-learning-based vocal synthesis libraries, unlike the picture/rule-based synthesis of earlier Vocaloid. They produce the most natural results for Japanese lyrics.

19.3 Synthesizer V (Dreamtonics)

Synthesizer V is an AI vocal synthesis tool developed by Dreamtonics in Tokyo. As of 2026, SynthV Studio Pro supports Japanese, English, Mandarin, and Korean vocal synthesis. It is the most widely used in Japanese content production.

19.4 AI Use Among Japanese Producers

Producers of J-pop artists like Daichi Miura have stated in interviews that they use AI for demo creation, vocal guides, and BGM sketches.

20. Workflow — Prompt → Generate → Extend → Stems → DAW

The workflow for using AI music in actual production looks like this.

20.1 The Standard Workflow (Five Steps)

1. **Prompt**: Write a text prompt specifying genre, mood, tempo, key, and instruments. For example: `lofi hip hop, 70 BPM, A minor, piano + jazz drums, melancholic`.

2. **Generate**: Generate songs in Suno/Udio/Stable Audio. You typically get 2-4 variations.

3. **Extend**: Stretch the preferred variation up to 8 minutes. Add intro/verse/chorus/outro.

4. **Stems**: Separate the finished song into stems. Either via Suno/Udio's built-in feature or Demucs.

5. **DAW**: Import the stems into a DAW for post-processing. Re-record vocals, swap beats, master.

20.2 Tips for Prompt Writing

- **Specify genre**: phrases like `style of jazz` or `genre: synthwave`.

- **Specify instruments**: `featuring acoustic guitar and harmonica`.

- **Specify mood**: `melancholic`, `uplifting`, `tense`.

- **References**: `style of Miles Davis` (gray area — safe only on licensed models).

- **Specify techniques**: `lo-fi production`, `analog tape saturation`, `vinyl crackle`.

20.3 Korean/Japanese Lyric Workflow

- Generate in English first → swap only the lyrics to Korean/Japanese once satisfied → regenerate.

- This keeps the musical structure on the strong foundation of English-trained data while pulling only the lyrics into the native language.

- Suno v4.5 handles Korean/Japanese lyrics to a degree, but coherence breaks beyond four minutes.

21. Comparison Table — Seven Tools at a Glance

| Tool | Category | Full-song length | Price | Strengths | Weaknesses | License |

|---|---|---|---|---|---|---|

| Suno v4.5 | Full song (vocals + instr.) | 4 min (Extend 8 min) | 10 USD/month+ | UI/UX, mainstream genres | Korean/Japanese | Commercial from Pro |

| Udio v2 | Full song (vocals + instr.) | 1:30 (Extend 15 min) | 10 USD/month+ | Hip-hop/R&B/Latin | Slightly rough vocals | Commercial from Pro |

| Stable Audio 2.0 | Instrumental | 3 min | 12 USD/month+ | Sound design, audio-to-audio | No vocals | ARC license |

| MusicGen 3.3B | Open, instrumental | 30 s~ (extendable) | Free (self-hosted) | Open, fine-tunable | CC-BY-NC, no vocals | Non-commercial |

| AIVA | Orchestral/cinematic | 5 min+ | 15 USD/month+ | MIDI editing, films/games | No vocals | Full ownership at Pro |

| Mubert | API/streaming | Infinite stream | 14 USD/month+ | API, game integration | Weak per-song control | Commercial from Creator |

| Soundraw | Structured BGM | User-specified | 17 USD/month+ | Video-edit precision cuts | No vocals | Perpetual royalty-free |

22. Frequently Asked Questions — FAQ

22.1 Can I post AI music to Spotify?

It depends on the AI tool's license. Suno Pro, Udio Pro, AIVA Pro, and Soundraw Creator and above explicitly allow commercial use, and Spotify and Apple Music accept it. That said, Spotify has previously bulk-removed tracks suspected of "irregular streaming" (as in the Boomy incident).

22.2 Who owns the copyright?

It depends on the tool's terms. AIVA Pro and Soundraw grant full copyright to the user. Suno and Udio grant the user usage rights while the tool company retains certain rights. The US Copyright Office's position is that "AI-generated content must contain human creative contribution to qualify for copyright registration."

22.3 Where should I start?

- **Hobby/experimentation**: Suno free plan.

- **YouTube BGM**: Mubert, Soundraw, AIVA.

- **Indie game soundtracks**: Mubert API, Soundraw, MusicGen (self-hosted).

- **Commercial music release**: Udio Pro, Suno Premier (be aware of legal risk).

- **Films/advertising**: AIVA Pro, Stable Audio 2.0.

22.4 Is AI music real music?

This question has no answer. One fact is clear, however — AI music in 2026 is not a tool that "replaces people" but a tool through which "people who could not previously make music start making music." Accepting that boundary makes it clear which tools to use and how.

23. References

- Suno official — https://suno.com/

- Udio official — https://www.udio.com/

- Stable Audio (Stability AI) — https://stability.ai/stable-audio

- Stable Audio 2.0 announcement — https://stability.ai/news/stable-audio-2-0

- Meta MusicGen GitHub — https://github.com/facebookresearch/audiocraft

- AudioCraft official — https://audiocraft.metademolab.com/

- MusicGen paper — https://arxiv.org/abs/2306.05284

- AIVA — https://www.aiva.ai/

- Mubert official — https://mubert.com/

- Soundraw official — https://soundraw.io/

- Boomy official — https://boomy.com/

- Riffusion (Beat-N) — https://www.riffusion.com/

- Google MusicLM — https://google-research.github.io/seanet/musiclm/examples/

- MusicFX DJ (Google Labs) — https://labs.google/fx/tools/music-fx-dj

- Google Lyria announcement — https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/

- Adobe Project Music GenAI Control — https://research.adobe.com/news/project-music-genai-control/

- AudioLM paper — https://google-research.github.io/seanet/audiolm/examples/

- Demucs (Meta) GitHub — https://github.com/facebookresearch/demucs

- Spleeter (Deezer) GitHub — https://github.com/deezer/spleeter

- Ultimate Vocal Remover GitHub — https://github.com/Anjok07/ultimatevocalremovergui

- LALAL.AI — https://www.lalal.ai/

- Moises AI — https://moises.ai/

- Magenta (Google) — https://magenta.tensorflow.org/

- Anticipatory Music Transformer — https://crfm.stanford.edu/2023/06/16/anticipatory-music-transformer.html

- RIAA's Suno / Udio coverage — https://www.riaa.com/news/

- Yamaha Vocaloid official — https://www.vocaloid.com/en/

- Dreamtonics Synthesizer V — https://dreamtonics.com/synthesizerv/

- NaturalSpeech 3 (Microsoft) — https://speechresearch.github.io/naturalspeech3/

- F5-TTS GitHub — https://github.com/SWivid/F5-TTS

현재 단락 (1/227)

When Suno v3 and the Udio beta started producing two-minute vocal tracks from a single text prompt i...

작성 글자: 0원문 글자: 27,397작성 단락: 0/227