
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Sat, 16 May 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/sepp-hochreiter/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.en</guid>
    <title>Foundation Model Architectures 2026 — Beyond the Transformer / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 Deep Dive</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.en</link>
    <description>In 2026 the foundation-model world is no longer Transformer-only. Vaswani 2017 &quot;Attention is All You Need&quot; remains the standard, but next to it stand state-space models (Mamba, Mamba 2), the linear-RNN renaissance (RWKV, RetNet, Griffin), hybrids (AI21 Jamba, Falcon Mamba), Sepp Hochreiters xLSTM, Test-Time Training, the DiT family behind Sora, MoE giants (Mixtral, DeepSeek-V3 671B, Google Million Experts), Flash Attention 3 and Ring Attention, plus the 1M+ token context era of Gemini 2M and Magic LTM-2-mini 100M. We map who solves what, who should pick which, and what the Korean and Japanese ecosystems are building.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>foundation-models</category><category>transformer</category><category>attention-is-all-you-need</category><category>vaswani</category><category>mamba</category><category>state-space-model</category><category>ssm</category><category>albert-gu</category><category>tri-dao</category><category>mamba-2</category><category>hyena</category><category>stanford-h2o</category><category>linear-attention</category><category>schmidhuber</category><category>rwkv</category><category>bo-peng</category><category>retnet</category><category>microsoft-retentive</category><category>griffin</category><category>deepmind-griffin</category><category>s5</category><category>jamba</category><category>ai21</category><category>falcon-mamba</category><category>xlstm</category><category>sepp-hochreiter</category><category>test-time-training</category><category>ttt</category><category>sun-et-al</category><category>dit</category><category>diffusion-transformer</category><category>sora-dit</category><category>mixture-of-experts</category><category>moe</category><category>mixtral</category><category>deepseek-v3-moe</category><category>million-experts</category><category>google-mome</category><category>flash-attention-3</category><category>ring-attention</category><category>gemini-2m</category><category>magic-ltm-2-mini</category><category>sakana-ai-evolutionary</category><category>2026</category><category>deep-dive</category><category>english</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.ja</guid>
    <title>基盤モデルのアーキテクチャ 2026 — Transformer の次へ / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 徹底ガイド</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.ja</link>
    <description>2026 年の基盤モデル界隈はもはや Transformer 一色ではない。Vaswani 2017 「Attention is All You Need」は今も標準だが、その隣に Mamba/Mamba 2 のような状態空間モデル、RWKV/RetNet/Griffin の線形 RNN 復活組、AI21 Jamba と Falcon Mamba のハイブリッド、Sepp Hochreiter の xLSTM、Test-Time Training、Sora の DiT、Mixtral/DeepSeek-V3 671B/Google Million Experts のような MoE、Flash Attention 3 と Ring Attention、Gemini 2M / Magic LTM-2-mini 100M の超長文脈までが揃った。どのアーキテクチャがどの問題に向くか、韓国・日本勢は何を作っているかを一気に整理。</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>foundation-models</category><category>transformer</category><category>attention-is-all-you-need</category><category>vaswani</category><category>mamba</category><category>state-space-model</category><category>ssm</category><category>albert-gu</category><category>tri-dao</category><category>mamba-2</category><category>hyena</category><category>stanford-h2o</category><category>linear-attention</category><category>schmidhuber</category><category>rwkv</category><category>bo-peng</category><category>retnet</category><category>microsoft-retentive</category><category>griffin</category><category>deepmind-griffin</category><category>s5</category><category>jamba</category><category>ai21</category><category>falcon-mamba</category><category>xlstm</category><category>sepp-hochreiter</category><category>test-time-training</category><category>ttt</category><category>sun-et-al</category><category>dit</category><category>diffusion-transformer</category><category>sora-dit</category><category>mixture-of-experts</category><category>moe</category><category>mixtral</category><category>deepseek-v3-moe</category><category>million-experts</category><category>google-mome</category><category>flash-attention-3</category><category>ring-attention</category><category>gemini-2m</category><category>magic-ltm-2-mini</category><category>sakana-ai-evolutionary</category><category>2026</category><category>deep-dive</category><category>日本語</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive</guid>
    <title>파운데이션 모델 아키텍처 2026 — Transformer 이후 / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 심층 가이드</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive</link>
    <description>2026년 파운데이션 모델 세계는 더 이상 Transformer 일변도가 아니다. Vaswani의 2017년 &quot;Attention is All You Need&quot;는 여전히 표준이지만, 그 옆에 Mamba/Mamba 2 같은 상태공간 모델(SSM), RWKV/RetNet/Griffin 같은 선형 RNN 재발견 진영, AI21 Jamba와 Falcon Mamba 같은 하이브리드, Sepp Hochreiter의 xLSTM, Test-Time Training, Sora의 DiT, Mixtral/DeepSeek-V3 671B/Google Million Experts 같은 MoE, Flash Attention 3와 Ring Attention, 그리고 Gemini 2M/Magic LTM-2-mini 100M의 초장문 컨텍스트까지 — 어떤 아키텍처가 어떤 문제에 강한지, 한국과 일본 진영은 무엇을 만들고 있는지 한 번에 정리.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>foundation-models</category><category>transformer</category><category>attention-is-all-you-need</category><category>vaswani</category><category>mamba</category><category>state-space-model</category><category>ssm</category><category>albert-gu</category><category>tri-dao</category><category>mamba-2</category><category>hyena</category><category>stanford-h2o</category><category>linear-attention</category><category>schmidhuber</category><category>rwkv</category><category>bo-peng</category><category>retnet</category><category>microsoft-retentive</category><category>griffin</category><category>deepmind-griffin</category><category>s5</category><category>jamba</category><category>ai21</category><category>falcon-mamba</category><category>xlstm</category><category>sepp-hochreiter</category><category>test-time-training</category><category>ttt</category><category>sun-et-al</category><category>dit</category><category>diffusion-transformer</category><category>sora-dit</category><category>mixture-of-experts</category><category>moe</category><category>mixtral</category><category>deepseek-v3-moe</category><category>million-experts</category><category>google-mome</category><category>flash-attention-3</category><category>ring-attention</category><category>gemini-2m</category><category>magic-ltm-2-mini</category><category>sakana-ai-evolutionary</category><category>2026</category><category>deep-dive</category>
  </item>

    </channel>
  </rss>
