
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Fri, 26 Jun 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/mqa/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/llm/2026-06-26-attention-variants-gqa-mqa-flashattention.en</guid>
    <title>The Evolution of Attention — MQA, GQA, FlashAttention, and Long Context</title>
    <link>https://www.youngju.dev/blog/llm/2026-06-26-attention-variants-gqa-mqa-flashattention.en</link>
    <description>We analyze the memory and compute cost of standard attention, then explain how MQA and GQA shrink the KV cache and how FlashAttention optimizes IO. We compare sliding-window and long-context techniques and trace how all of these choices affect serving memory.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm</category><category>attention</category><category>flashattention</category><category>gqa</category><category>mqa</category><category>kv-cache</category><category>long-context</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/llm/2026-06-26-attention-variants-gqa-mqa-flashattention.ja</guid>
    <title>アテンションの進化 — MQA、GQA、FlashAttention、そして長いコンテキスト</title>
    <link>https://www.youngju.dev/blog/llm/2026-06-26-attention-variants-gqa-mqa-flashattention.ja</link>
    <description>標準アテンションのメモリと演算コストを分析し、MQAとGQAがKV cacheをどう削減するか、FlashAttentionがIOをどう最適化するかを説明します。スライディングウィンドウや長コンテキスト手法を比較し、これらの選択がサービングメモリに与える影響までたどります。</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm</category><category>attention</category><category>flashattention</category><category>gqa</category><category>mqa</category><category>kv-cache</category><category>long-context</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/llm/2026-06-26-attention-variants-gqa-mqa-flashattention</guid>
    <title>어텐션의 진화 — MQA, GQA, FlashAttention, 그리고 긴 컨텍스트</title>
    <link>https://www.youngju.dev/blog/llm/2026-06-26-attention-variants-gqa-mqa-flashattention</link>
    <description>표준 어텐션의 메모리와 연산 비용을 분석하고, MQA와 GQA가 KV cache를 어떻게 줄이는지, FlashAttention이 IO를 어떻게 최적화하는지 설명합니다. 슬라이딩 윈도우와 롱컨텍스트 기법, 그리고 이 모든 선택이 서빙 메모리에 미치는 영향까지 비교합니다.</description>
    <pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm</category><category>attention</category><category>flashattention</category><category>gqa</category><category>mqa</category><category>kv-cache</category><category>long-context</category>
  </item>

    </channel>
  </rss>
