Chaos and Order

Chaos and Order https://www.youngju.dev/blog 천천히 올바르게. AI Researcher & DevOps Engineer Youngju's tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering. ko fjvbn2003@gmail.com (Youngju Kim) fjvbn2003@gmail.com (Youngju Kim) Tue, 30 Jun 2026 00:00:00 GMT https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.en Analyzing SOTA Multimodal LLMs — One Model to See, Hear, and Speak https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.en How did a language model trained purely on text come to understand and generate images, audio, and video? This post walks through modality encoders and projectors, the unified token space, the any-to-any flow, native multimodal versus adapter grafting, and the training strategies, benchmarks, and limitations along the way. Tue, 30 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) multimodal-llmany-to-anyvision-languageaudioarchitecture https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.ja SOTAマルチモーダルLLM分析 — 一つのモデルで見て聞いて話す https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.ja テキストだけで学習されたLLMが、どのようにして画像・音声・動画まで理解し生成できるようになったのかを見ていきます。モダリティごとのエンコーダとプロジェクタ、統合トークン空間、any-to-anyの流れ、ネイティブマルチモーダルとアダプタ接合、そして学習戦略やベンチマーク・限界までを整理します。 Tue, 30 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) multimodal-llmany-to-anyvision-languageaudioarchitecture https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any SOTA 멀티모달 LLM 분석 — 하나의 모델로 보고 듣고 말하다 https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any 텍스트 하나로 학습된 LLM이 어떻게 이미지, 오디오, 비디오까지 이해하고 생성하게 되었는지 살펴봅니다. 모달별 인코더와 프로젝터, 통합 토큰 공간, any-to-any 흐름, 네이티브 멀티모달과 어댑터 접합 방식, 그리고 학습 전략과 벤치마크·한계까지 정리합니다. Tue, 30 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) multimodal-llmany-to-anyvision-languageaudioarchitecture