
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Tue, 30 Jun 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/multimodal-llm/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.en</guid>
    <title>Analyzing SOTA Multimodal LLMs — One Model to See, Hear, and Speak</title>
    <link>https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.en</link>
    <description>How did a language model trained purely on text come to understand and generate images, audio, and video? This post walks through modality encoders and projectors, the unified token space, the any-to-any flow, native multimodal versus adapter grafting, and the training strategies, benchmarks, and limitations along the way.</description>
    <pubDate>Tue, 30 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>multimodal-llm</category><category>any-to-any</category><category>vision-language</category><category>audio</category><category>architecture</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.ja</guid>
    <title>SOTAマルチモーダルLLM分析 — 一つのモデルで見て聞いて話す</title>
    <link>https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any.ja</link>
    <description>テキストだけで学習されたLLMが、どのようにして画像・音声・動画まで理解し生成できるようになったのかを見ていきます。モダリティごとのエンコーダとプロジェクタ、統合トークン空間、any-to-anyの流れ、ネイティブマルチモーダルとアダプタ接合、そして学習戦略やベンチマーク・限界までを整理します。</description>
    <pubDate>Tue, 30 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>multimodal-llm</category><category>any-to-any</category><category>vision-language</category><category>audio</category><category>architecture</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any</guid>
    <title>SOTA 멀티모달 LLM 분석 — 하나의 모델로 보고 듣고 말하다</title>
    <link>https://www.youngju.dev/blog/2026-06-30-sota-multimodal-llm-any-to-any</link>
    <description>텍스트 하나로 학습된 LLM이 어떻게 이미지, 오디오, 비디오까지 이해하고 생성하게 되었는지 살펴봅니다. 모달별 인코더와 프로젝터, 통합 토큰 공간, any-to-any 흐름, 네이티브 멀티모달과 어댑터 접합 방식, 그리고 학습 전략과 벤치마크·한계까지 정리합니다.</description>
    <pubDate>Tue, 30 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>multimodal-llm</category><category>any-to-any</category><category>vision-language</category><category>audio</category><category>architecture</category>
  </item>

    </channel>
  </rss>
