Chaos and Order

Chaos and Order https://www.youngju.dev/blog 천천히 올바르게. AI Researcher & DevOps Engineer Youngju's tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering. ko fjvbn2003@gmail.com (Youngju Kim) fjvbn2003@gmail.com (Youngju Kim) Fri, 26 Jun 2026 00:00:00 GMT https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.en Beyond OCR — OCR-free Document Understanding and Unified Models https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.en A traditional OCR pipeline splits into detection, recognition, and layout stages, but errors accumulate. We organize the shift in document AI: Donut-style and VLM-based OCR-free document understanding, high-resolution and table handling, and the move toward unified models. Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) ai-papersocr-freedocument-understandingmultimodaldonutvision-language-model https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.ja OCR を超えて — OCR-free な文書理解と統合モデル https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding.ja 従来の OCR パイプラインは検出・認識・レイアウトを段階に分けますが誤差が累積します。Donut 系と VLM ベースの OCR-free 文書理解、高解像度と表処理、統合モデルへの流れまで、文書 AI の転換を整理します。 Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) ai-papersocr-freedocument-understandingmultimodaldonutvision-language-model https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding OCR을 넘어서 — OCR-free 문서 이해와 통합 모델 https://www.youngju.dev/blog/ai-papers/2026-06-26-ocr-free-document-understanding 전통적인 OCR 파이프라인은 검출-인식-레이아웃을 단계로 나누지만 오류가 누적됩니다. Donut류와 VLM 기반의 OCR-free 문서 이해, 고해상도와 표 처리, 통합 모델의 흐름까지 문서 AI의 전환을 정리합니다. Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) ai-papersocr-freedocument-understandingmultimodaldonutvision-language-model https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.en Vision LLM Architecture — How an Image Becomes Language https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.en A vision-language model processes an image with a vision encoder, then passes it through a projector to produce tokens an LLM can read. From patch embedding to arbitrary-resolution handling, we trace the full path by which an image turns into language tokens. Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) llmvision-language-modelmultimodalvitqwen2-vlarchitecture https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.ja Vision LLM アーキテクチャ — 画像が言語になるまで https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture.ja ビジョン言語モデルは画像をビジョンエンコーダで処理し、プロジェクタを通して LLM が読めるトークンへ変換します。パッチ埋め込みから任意解像度処理まで、画像が言語トークンになる全過程を構造的に見ていきます。 Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) llmvision-language-modelmultimodalvitqwen2-vlarchitecture https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture Vision LLM 아키텍처 — 이미지가 언어가 되기까지 https://www.youngju.dev/blog/llm/2026-06-26-vision-language-model-architecture 비전-언어 모델은 이미지를 비전 인코더로 처리한 뒤 프로젝터를 거쳐 LLM이 이해할 수 있는 토큰으로 바꿉니다. 패치 임베딩부터 임의 해상도 처리까지, 이미지가 언어 토큰이 되는 전 과정을 구조적으로 살펴봅니다. Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) llmvision-language-modelmultimodalvitqwen2-vlarchitecture https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.en Training Vision LLMs — How to Teach Input and Output https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.en A vision-language model is trained in stages, from alignment pretraining to instruction fine-tuning. We organize what gets taught and how, from the angle of the training pipeline: vision encoder freezing strategy, data composition, input format and output shapes, and loss computation. Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) mlopsvision-language-modelmultimodaltraininginstruction-tuningfine-tuning https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.ja Vision LLM の学習法 — 入力と出力をどう教えるか https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models.ja ビジョン言語モデルは整列事前学習からインストラクションファインチューニングまで段階的に学習されます。ビジョンエンコーダの凍結戦略、データ構成、入力フォーマットと出力形式、損失計算まで、何をどう教えるかを学習パイプラインの観点で整理します。 Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) mlopsvision-language-modelmultimodaltraininginstruction-tuningfine-tuning https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models Vision LLM 학습법 — input과 output을 어떻게 가르치나 https://www.youngju.dev/blog/mlops/2026-06-26-training-vision-language-models 비전-언어 모델은 정렬 사전학습부터 인스트럭션 파인튜닝까지 단계적으로 학습됩니다. 비전 인코더 동결 전략, 데이터 구성, 입력 포맷과 출력 형태, 손실 계산까지 무엇을 어떻게 가르치는지 학습 파이프라인 관점에서 정리합니다. Fri, 26 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) mlopsvision-language-modelmultimodaltraininginstruction-tuningfine-tuning