
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Fri, 26 Jun 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/vision-language-model/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.en</guid>
    <title>Beyond OCR — OCR-free Document Understanding and Unified Models</title>
    <link>https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.en</link>
    <description>A traditional OCR pipeline splits into detection, recognition, and layout stages, but errors accumulate. We organize the shift in document AI: Donut-style and VLM-based OCR-free document understanding, high-resolution and table handling, and the move toward unified models.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>ai-papers</category><category>ocr-free</category><category>document-understanding</category><category>multimodal</category><category>donut</category><category>vision-language-model</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.ja</guid>
    <title>OCR を超えて — OCR-free な文書理解と統合モデル</title>
    <link>https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.ja</link>
    <description>従来の OCR パイプラインは検出・認識・レイアウトを段階に分けますが誤差が累積します。Donut 系と VLM ベースの OCR-free 文書理解、高解像度と表処理、統合モデルへの流れまで、文書 AI の転換を整理します。</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>ai-papers</category><category>ocr-free</category><category>document-understanding</category><category>multimodal</category><category>donut</category><category>vision-language-model</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding</guid>
    <title>OCR을 넘어서 — OCR-free 문서 이해와 통합 모델</title>
    <link>https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding</link>
    <description>전통적인 OCR 파이프라인은 검출-인식-레이아웃을 단계로 나누지만 오류가 누적됩니다. Donut류와 VLM 기반의 OCR-free 문서 이해, 고해상도와 표 처리, 통합 모델의 흐름까지 문서 AI의 전환을 정리합니다.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>ai-papers</category><category>ocr-free</category><category>document-understanding</category><category>multimodal</category><category>donut</category><category>vision-language-model</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.en</guid>
    <title>Vision LLM Architecture — How an Image Becomes Language</title>
    <link>https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.en</link>
    <description>A vision-language model processes an image with a vision encoder, then passes it through a projector to produce tokens an LLM can read. From patch embedding to arbitrary-resolution handling, we trace the full path by which an image turns into language tokens.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm</category><category>vision-language-model</category><category>multimodal</category><category>vit</category><category>qwen2-vl</category><category>architecture</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.ja</guid>
    <title>Vision LLM アーキテクチャ — 画像が言語になるまで</title>
    <link>https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.ja</link>
    <description>ビジョン言語モデルは画像をビジョンエンコーダで処理し、プロジェクタを通して LLM が読めるトークンへ変換します。パッチ埋め込みから任意解像度処理まで、画像が言語トークンになる全過程を構造的に見ていきます。</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm</category><category>vision-language-model</category><category>multimodal</category><category>vit</category><category>qwen2-vl</category><category>architecture</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture</guid>
    <title>Vision LLM 아키텍처 — 이미지가 언어가 되기까지</title>
    <link>https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture</link>
    <description>비전-언어 모델은 이미지를 비전 인코더로 처리한 뒤 프로젝터를 거쳐 LLM이 이해할 수 있는 토큰으로 바꿉니다. 패치 임베딩부터 임의 해상도 처리까지, 이미지가 언어 토큰이 되는 전 과정을 구조적으로 살펴봅니다.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm</category><category>vision-language-model</category><category>multimodal</category><category>vit</category><category>qwen2-vl</category><category>architecture</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.en</guid>
    <title>Training Vision LLMs — How to Teach Input and Output</title>
    <link>https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.en</link>
    <description>A vision-language model is trained in stages, from alignment pretraining to instruction fine-tuning. We organize what gets taught and how, from the angle of the training pipeline: vision encoder freezing strategy, data composition, input format and output shapes, and loss computation.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>mlops</category><category>vision-language-model</category><category>multimodal</category><category>training</category><category>instruction-tuning</category><category>fine-tuning</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.ja</guid>
    <title>Vision LLM の学習法 — 入力と出力をどう教えるか</title>
    <link>https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.ja</link>
    <description>ビジョン言語モデルは整列事前学習からインストラクションファインチューニングまで段階的に学習されます。ビジョンエンコーダの凍結戦略、データ構成、入力フォーマットと出力形式、損失計算まで、何をどう教えるかを学習パイプラインの観点で整理します。</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>mlops</category><category>vision-language-model</category><category>multimodal</category><category>training</category><category>instruction-tuning</category><category>fine-tuning</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models</guid>
    <title>Vision LLM 학습법 — input과 output을 어떻게 가르치나</title>
    <link>https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models</link>
    <description>비전-언어 모델은 정렬 사전학습부터 인스트럭션 파인튜닝까지 단계적으로 학습됩니다. 비전 인코더 동결 전략, 데이터 구성, 입력 포맷과 출력 형태, 손실 계산까지 무엇을 어떻게 가르치는지 학습 파이프라인 관점에서 정리합니다.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>mlops</category><category>vision-language-model</category><category>multimodal</category><category>training</category><category>instruction-tuning</category><category>fine-tuning</category>
  </item>

    </channel>
  </rss>
