Deepseek-r1

All Posts

Published on
2026년 5월 16일
LLM 랜드마크 논문 2026 완벽 가이드 - Transformer · Scaling Laws · Flash Attention · Mamba · DeepSeek-R1 · Titans 심층 분석
llm-papers transformer scaling-laws flash-attention mamba deepseek-r1 titans research ai-papers deep-dive
2017년 Attention Is All You Need에서 2026년 Titans와 DeepSeek-R1까지, LLM 시대를 만든 50여 편의 랜드마크 논문을 테마별로 정리한다. Transformer · BERT · GPT 시리즈 · Scaling Laws · Chinchilla · InstructGPT · PaLM · Flash Attention 1/2/3 · LLaMA 1/2/3/4 · GPT-4 · Mistral · Mixtral · DPO · KTO · ORPO · RWKV · Mamba 1/2 · DeepSeek-V3 · DeepSeek-R1 · o1 · Titans · TTT · Era of Experience · Tülu 3 · Sleeper Agents · Scaling Monosemanticity · Mixture of a Million Experts · RoPE · YARN · Ring Attention · GPTQ · AWQ · BitNet b1.58 · DDPM · DiT · MMR1까지, 각 논문의 기여와 영향을 1단락 단위로 정리하고 실제 arxiv URL을 함께 제공한다. PR12, Tunib 잎차이, 일본 Connpass 論文読み会, PFN 블로그 등 한일 리딩 그룹 자료까지 묶어 2026년 5월 기준 가장 압축된 LLM 논문 로드맵을 만든다.
Published on
2026년 5월 14일
추론 모델(reasoning models) 2026 가이드 — o3·o4·DeepSeek R1·Claude Thinking·Gemini Deep Think·QwQ 심층 비교
reasoning-models o3 o4 deepseek-r1 claude-thinking gemini-deep-think qwq rlvr test-time-compute llm
o1이 2024년 9월에 test-time compute라는 새로운 축을 열고 1년 반이 지났다. 2026년 현재 '추론 모델(reasoning model)'은 별도의 모델군이 아니라, 모든 프론티어 모델의 한 상태(mode)가 됐다. OpenAI o3·o3-pro·o4, DeepSeek R1·R1-0528·V3.1 reasoner, Anthropic Claude Sonnet 4.5·Opus 4.5의 extended thinking, Google Gemini 2.5 Pro·Deep Think, Alibaba Qwen QwQ·QwQ-Plus, xAI Grok 3·4 Heavy thinking — 여섯 가족의 추론 모드를 thinking budget·AIME·SWE-bench·도구 사용·가격까지 한눈에 정리한다. RLVR(verifiable rewards) 레시피, 추론 모델이 진짜로 필요한 순간, 그리고 빠른 비추론 모델이 더 나은 순간.

Deepseek-r1

deepseek-r1 (2)

LLM 랜드마크 논문 2026 완벽 가이드 - Transformer · Scaling Laws · Flash Attention · Mamba · DeepSeek-R1 · Titans 심층 분석

추론 모델(reasoning models) 2026 가이드 — o3·o4·DeepSeek R1·Claude Thinking·Gemini Deep Think·QwQ 심층 비교