Ttt

Published on
2026년 5월 16일
파운데이션 모델 아키텍처 2026 — Transformer 이후 / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 심층 가이드
foundation-models transformer attention-is-all-you-need vaswani mamba state-space-model ssm albert-gu tri-dao mamba-2 hyena stanford-h2o linear-attention schmidhuber rwkv bo-peng retnet microsoft-retentive griffin deepmind-griffin s5 jamba ai21 falcon-mamba xlstm sepp-hochreiter test-time-training ttt sun-et-al dit diffusion-transformer sora-dit moe mixtral deepseek-v3-moe million-experts google-mome flash-attention-3 ring-attention gemini-2m magic-ltm-2-mini sakana-ai-evolutionary deep-dive
2026년 파운데이션 모델 세계는 더 이상 Transformer 일변도가 아니다. Vaswani의 2017년 "Attention is All You Need"는 여전히 표준이지만, 그 옆에 Mamba/Mamba 2 같은 상태공간 모델(SSM), RWKV/RetNet/Griffin 같은 선형 RNN 재발견 진영, AI21 Jamba와 Falcon Mamba 같은 하이브리드, Sepp Hochreiter의 xLSTM, Test-Time Training, Sora의 DiT, Mixtral/DeepSeek-V3 671B/Google Million Experts 같은 MoE, Flash Attention 3와 Ring Attention, 그리고 Gemini 2M/Magic LTM-2-mini 100M의 초장문 컨텍스트까지 — 어떤 아키텍처가 어떤 문제에 강한지, 한국과 일본 진영은 무엇을 만들고 있는지 한 번에 정리.

파운데이션 모델 아키텍처 2026 — Transformer 이후 / Mamba 2 / Hyena / RWKV / RetNet / Griffin / Jamba / xLSTM / TTT / DiT / MoE / Flash Attention 3 심층 가이드