
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Tue, 16 Jun 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/sparsity/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-inference-hardware-quantization-sparsity-dataflow.en</guid>
    <title>Making Inference Fast — Quantization, Sparsity, and Dataflow from a Hardware Lens</title>
    <link>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-inference-hardware-quantization-sparsity-dataflow.en</link>
    <description>We break down the cost structure of inference through the memory-wall lens, then connect quantization (INT8/FP8/FP4), structured sparsity (2:4), dataflow architectures, operator fusion, batching and KV caching into one picture of hardware-software co-design. Reflects the 2026 reality of Blackwell FP4 and the Vera Rubin trajectory, with practical guidance for shipping fast inference.</description>
    <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>inference</category><category>quantization</category><category>sparsity</category><category>dataflow</category><category>gpu</category><category>hardware</category><category>optimization</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-inference-hardware-quantization-sparsity-dataflow.ja</guid>
    <title>推論を速く — 量子化、スパース性、Dataflow のハードウェア視点</title>
    <link>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-inference-hardware-quantization-sparsity-dataflow.ja</link>
    <description>推論コストの構造をメモリウォールの観点から解きほぐし、量子化(INT8/FP8/FP4)、構造化スパース性(2:4)、dataflow アーキテクチャ、演算子融合、バッチングと KV キャッシュまでをハードウェアとソフトウェアの協調設計として一枚の絵にまとめます。2026 年の Blackwell FP4 と Vera Rubin の流れを反映し、実務での適用ポイントを示します。</description>
    <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>inference</category><category>quantization</category><category>sparsity</category><category>dataflow</category><category>gpu</category><category>hardware</category><category>optimization</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-inference-hardware-quantization-sparsity-dataflow</guid>
    <title>추론을 빠르게 — 양자화, 희소성, Dataflow의 하드웨어 관점</title>
    <link>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-inference-hardware-quantization-sparsity-dataflow</link>
    <description>추론 비용의 구조를 메모리 월 관점에서 풀어내고, 양자화(INT8/FP8/FP4)와 구조적 희소성(2:4), dataflow 아키텍처, 연산자 융합, 배칭과 KV 캐시까지 하드웨어-소프트웨어 공동설계의 큰 그림을 정리합니다. 2026년 Blackwell FP4와 Vera Rubin 흐름을 반영해 실무 적용 포인트를 짚습니다.</description>
    <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>inference</category><category>quantization</category><category>sparsity</category><category>dataflow</category><category>gpu</category><category>hardware</category><category>optimization</category>
  </item>

    </channel>
  </rss>
