
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Tue, 16 Jun 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/matrix-multiply/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-systolic-array-dataflow-architecture.en</guid>
    <title>Systolic Arrays and Dataflow Architecture — The Heart of the TPU</title>
    <link>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-systolic-array-dataflow-architecture.en</link>
    <description>A deep dive into the systolic array, the structure that lets AI accelerators run matrix multiplication efficiently, complete with ASCII diagrams. We walk through dataflow strategies like weight-stationary and output-stationary, data reuse and energy, the comparison with tensor cores, and compiler mapping — the core principles that power the TPU.</description>
    <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>gpu-cuda</category><category>systolic-array</category><category>dataflow</category><category>tpu</category><category>ai-hardware</category><category>matrix-multiply</category><category>accelerator</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-systolic-array-dataflow-architecture.ja</guid>
    <title>シストリックアレイとデータフローアーキテクチャ — TPUの心臓原理</title>
    <link>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-systolic-array-dataflow-architecture.ja</link>
    <description>AIアクセラレータの中核演算である行列積を効率的に処理するシストリックアレイの動作原理を、ASCII図とともに深く掘り下げます。weight-stationaryやoutput-stationaryといったデータフロー戦略、データ再利用とエネルギー、テンソルコアとの比較、コンパイラのマッピングまで、TPUの心臓原理を整理します。</description>
    <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>gpu-cuda</category><category>systolic-array</category><category>dataflow</category><category>tpu</category><category>ai-hardware</category><category>matrix-multiply</category><category>accelerator</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-systolic-array-dataflow-architecture</guid>
    <title>Systolic Array와 Dataflow 아키텍처 — TPU의 심장 원리</title>
    <link>https://www.youngju.dev/blog/gpu-cuda/2026-06-16-systolic-array-dataflow-architecture</link>
    <description>AI 가속기의 핵심 연산인 행렬곱을 효율적으로 처리하는 systolic array의 동작 원리를 ASCII 다이어그램과 함께 깊이 파헤칩니다. weight-stationary와 output-stationary 같은 dataflow 전략, 데이터 재사용과 에너지, 텐서코어와의 비교, 컴파일러 매핑까지 TPU의 심장 원리를 정리합니다.</description>
    <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>gpu-cuda</category><category>systolic-array</category><category>dataflow</category><category>tpu</category><category>ai-hardware</category><category>matrix-multiply</category><category>accelerator</category>
  </item>

    </channel>
  </rss>
