
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Tue, 17 Mar 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/gpu-memory/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/gpu-cuda/2026-03-17-gpu-memory-inference-optimization-guide</guid>
    <title>GPU 메모리 관리 &amp; LLM 추론 최적화: vLLM, PagedAttention, GPTQ, TensorRT-LLM까지</title>
    <link>https://www.youngju.dev/blog/gpu-cuda/2026-03-17-gpu-memory-inference-optimization-guide</link>
    <description>HBM 메모리 계층, KV 캐시 계산, PagedAttention, GPTQ/AWQ 양자화, continuous batching, vLLM vs TensorRT-LLM 비교까지 LLM 추론 최적화 완전 가이드입니다.</description>
    <pubDate>Tue, 17 Mar 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>gpu-memory</category><category>llm-inference</category><category>vllm</category><category>paged-attention</category><category>gptq</category><category>tensorrt-llm</category><category>2026-03</category>
  </item>

    </channel>
  </rss>
