Chaos and Order

Chaos and Order https://www.youngju.dev/blog 천천히 올바르게. AI Researcher & DevOps Engineer Youngju's tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering. ko fjvbn2003@gmail.com (Youngju Kim) fjvbn2003@gmail.com (Youngju Kim) Sat, 16 May 2026 00:00:00 GMT https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.en Foundation Model Architectures 2026 — Beyond the Transformer / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 Deep Dive https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.en In 2026 the foundation-model world is no longer Transformer-only. Vaswani 2017 "Attention is All You Need" remains the standard, but next to it stand state-space models (Mamba, Mamba 2), the linear-RNN renaissance (RWKV, RetNet, Griffin), hybrids (AI21 Jamba, Falcon Mamba), Sepp Hochreiters xLSTM, Test-Time Training, the DiT family behind Sora, MoE giants (Mixtral, DeepSeek-V3 671B, Google Million Experts), Flash Attention 3 and Ring Attention, plus the 1M+ token context era of Gemini 2M and Magic LTM-2-mini 100M. We map who solves what, who should pick which, and what the Korean and Japanese ecosystems are building. Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) foundation-modelstransformerattention-is-all-you-needvaswanimambastate-space-modelssmalbert-gutri-daomamba-2hyenastanford-h2olinear-attentionschmidhuberrwkvbo-pengretnetmicrosoft-retentivegriffindeepmind-griffins5jambaai21falcon-mambaxlstmsepp-hochreitertest-time-trainingtttsun-et-alditdiffusion-transformersora-ditmixture-of-expertsmoemixtraldeepseek-v3-moemillion-expertsgoogle-momeflash-attention-3ring-attentiongemini-2mmagic-ltm-2-minisakana-ai-evolutionary2026deep-diveenglish https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.ja 基盤モデルのアーキテクチャ 2026 — Transformer の次へ / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 徹底ガイド https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive.ja 2026 年の基盤モデル界隈はもはや Transformer 一色ではない。Vaswani 2017 「Attention is All You Need」は今も標準だが、その隣に Mamba/Mamba 2 のような状態空間モデル、RWKV/RetNet/Griffin の線形 RNN 復活組、AI21 Jamba と Falcon Mamba のハイブリッド、Sepp Hochreiter の xLSTM、Test-Time Training、Sora の DiT、Mixtral/DeepSeek-V3 671B/Google Million Experts のような MoE、Flash Attention 3 と Ring Attention、Gemini 2M / Magic LTM-2-mini 100M の超長文脈までが揃った。どのアーキテクチャがどの問題に向くか、韓国・日本勢は何を作っているかを一気に整理。 Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) foundation-modelstransformerattention-is-all-you-needvaswanimambastate-space-modelssmalbert-gutri-daomamba-2hyenastanford-h2olinear-attentionschmidhuberrwkvbo-pengretnetmicrosoft-retentivegriffindeepmind-griffins5jambaai21falcon-mambaxlstmsepp-hochreitertest-time-trainingtttsun-et-alditdiffusion-transformersora-ditmixture-of-expertsmoemixtraldeepseek-v3-moemillion-expertsgoogle-momeflash-attention-3ring-attentiongemini-2mmagic-ltm-2-minisakana-ai-evolutionary2026deep-dive日本語 https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive 파운데이션 모델 아키텍처 2026 — Transformer 이후 / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 심층 가이드 https://www.youngju.dev/blog/culture/2026-05-16-foundation-model-architectures-beyond-transformer-2026-mamba-hyena-rwkv-retnet-griffin-jamba-xlstm-ttt-dit-moe-flash-attention-3-deep-dive 2026년 파운데이션 모델 세계는 더 이상 Transformer 일변도가 아니다. Vaswani의 2017년 "Attention is All You Need"는 여전히 표준이지만, 그 옆에 Mamba/Mamba 2 같은 상태공간 모델(SSM), RWKV/RetNet/Griffin 같은 선형 RNN 재발견 진영, AI21 Jamba와 Falcon Mamba 같은 하이브리드, Sepp Hochreiter의 xLSTM, Test-Time Training, Sora의 DiT, Mixtral/DeepSeek-V3 671B/Google Million Experts 같은 MoE, Flash Attention 3와 Ring Attention, 그리고 Gemini 2M/Magic LTM-2-mini 100M의 초장문 컨텍스트까지 — 어떤 아키텍처가 어떤 문제에 강한지, 한국과 일본 진영은 무엇을 만들고 있는지 한 번에 정리. Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) foundation-modelstransformerattention-is-all-you-needvaswanimambastate-space-modelssmalbert-gutri-daomamba-2hyenastanford-h2olinear-attentionschmidhuberrwkvbo-pengretnetmicrosoft-retentivegriffindeepmind-griffins5jambaai21falcon-mambaxlstmsepp-hochreitertest-time-trainingtttsun-et-alditdiffusion-transformersora-ditmixture-of-expertsmoemixtraldeepseek-v3-moemillion-expertsgoogle-momeflash-attention-3ring-attentiongemini-2mmagic-ltm-2-minisakana-ai-evolutionary2026deep-dive