- Published on
Superhuman Game AI in 2026 — Stockfish 17 / Leela Chess Zero / KataGo / AlphaZero / MuZero / Cicero / Pluribus / AlphaStar / Shogi dlshogi Deep Dive
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Prologue — The Age Where Humans No Longer Win
- Chapter 1 · The 2026 Game AI Map — Four Categories
- Chapter 2 · Stockfish 17 — The Strongest Chess Engine
- Chapter 3 · Leela Chess Zero (Lc0) — Neural-Net Chess Engine
- Chapter 4 · Komodo Dragon 3 — The Last Major Commercial Chess Engine
- Chapter 5 · AlphaZero to MuZero — The DeepMind Line
- Chapter 6 · Maia — Human-Like Chess (MS Research + Toronto)
- Chapter 7 · KataGo — The Apex of Distributed Go Training
- Chapter 8 · AlphaGo — The 2016 Series
- Chapter 9 · Pluribus — Conquering 6-Player Poker (Meta 2019)
- Chapter 10 · Cicero — Diplomacy (Meta, 2022)
- Chapter 11 · AlphaStar — StarCraft 2 (DeepMind 2019)
- Chapter 12 · OpenAI Five — Dota 2
- Chapter 13 · Suphx — Mahjong (Microsoft 2019)
- Chapter 14 · AlphaProof + AlphaGeometry — IMO Silver (2024)
- Chapter 15 · Chess UIs — lichess / chess.com / ChessBase / Arena / Banksia / NIBBLER
- Chapter 16 · UCI and XBoard Protocols
- Chapter 17 · Korea — NCsoft's Hancho, and Lee Sedol
- Chapter 18 · Japan — Shogi AI History, dlshogi, Yaneura-ou
- Chapter 19 · Who Should Learn Game AI?
- Chapter 20 · Closing — What "Superhuman" Now Means
- References
Prologue — The Age Where Humans No Longer Win
In March 2016, Lee Sedol lost to AlphaGo 1-4. Many people said "Go is over now," and they were right. In 2017, AlphaGo Zero surpassed AlphaGo by playing only against itself with zero human games. That same year, AlphaZero conquered chess, shogi and Go with a single algorithm. In 2019, MuZero did the same thing without even knowing the rules of the game.
Chess is similar. Stockfish 17 beats the human world champion essentially 100% of the time at any time control. Stockfish vs Leela Chess Zero (Lc0) at the TCEC finals is a tournament where humans are spectators. Even Stockfish running on a phone beats human grandmasters.
But game AI is not only chess and Go. Pluribus (Meta, 2019) beat human pros in 6-player no-limit Texas Hold'em. Cicero (Meta, 2022) negotiated alliances and betrayals in natural language in Diplomacy and finished in the top 10%. AlphaStar in StarCraft 2, OpenAI Five in Dota 2, Suphx in Mahjong, and in 2024, AlphaProof + AlphaGeometry scored at the silver-medal level at the International Math Olympiad.
This article, as of 2026, lays out one map: what game AIs exist, how far they have come, and what algorithms they use. It is not just a chronology — we group by family (MCTS / NNUE / self-play / CFR / model-based RL).
Chapter 1 · The 2026 Game AI Map — Four Categories
One clean way to slice game AI is by information completeness and player count.
| Category | Information | Players | Examples | Representative AIs |
|---|---|---|---|---|
| Perfect info, 2 player | public | 2 | chess, Go, shogi | Stockfish, Lc0, KataGo, AlphaZero, dlshogi |
| Perfect info, 1 player puzzle | public | 1 | math proofs | AlphaProof, AlphaGeometry |
| Imperfect info, 2 player | private | 2 | heads-up poker | Libratus, DeepStack |
| Imperfect info, multi player | private | 3+ | 6-player poker, mahjong | Pluribus, Suphx |
| Imperfect info + language | private + NL | 7 | Diplomacy | Cicero |
| Real-time, partial observation | partial | 2-10 | StarCraft 2, Dota 2 | AlphaStar, OpenAI Five |
This axis matters because the algorithm changes.
- Perfect-info 2-player zero-sum is minimax-friendly. Either alpha-beta (Stockfish) or MCTS + neural network (Lc0, KataGo, AlphaZero).
- Imperfect info breaks minimax. The standard family is CFR (counterfactual regret minimization), of which Libratus and Pluribus are heirs.
- Multi-player + language + cooperation breaks both. You need something like Cicero — RL fused with an LLM.
- Real-time makes time itself a move. Policy networks plus distributed self-play (AlphaStar, OpenAI Five).
Keep this map in mind; from the next chapter, we look at each species in turn.
Chapter 2 · Stockfish 17 — The Strongest Chess Engine
Stockfish is an open-source chess engine started in 2008. Written in C++, GPL v3 licensed, developed at github.com/official-stockfish/Stockfish. As of 2026 the latest stable release is Stockfish 17, and it sits at the top of both CCRL and TCEC.
What Changed — Alpha-Beta + NNUE
Classical Stockfish used alpha-beta pruning plus a long list of heuristics (null-move pruning, late move reductions, futility pruning, etc.). Its evaluation function was hand-crafted chess knowledge — pawn structure, king safety, mobility, and so on.
Starting from Stockfish 12 (2020), NNUE (Efficiently Updatable Neural Network) was introduced. Its design came from the Japanese shogi community (the Yaneura-ou group, especially Yu Nasu). The trick: a small neural network that evaluates very quickly on CPU, with no GPU needed, and updates only what changes from one move to the next — "efficiently updatable".
Key features of Stockfish 17:
- NNUE is the default; the hand-crafted evaluation is a fallback.
- Search is still alpha-beta based — the opposite of Lc0's MCTS.
- Multithreading is excellent and scales almost linearly to 128 cores.
- It runs on phones — an iPhone 16 Pro does hundreds of thousands of nodes per second.
How to Run It
# Linux / macOS — install via package manager
brew install stockfish # macOS
sudo apt install stockfish # Debian / Ubuntu
# Or download from: https://stockfishchess.org/download/
# Run in UCI mode
stockfish
# UCI session example
uci
id name Stockfish 17
id author the Stockfish developers
...
uciok
position startpos moves e2e4 e7e5
go depth 20
info depth 20 seldepth 28 multipv 1 score cp 31 nodes 1234567 ...
bestmove g1f3 ponder b8c6
Has Stockfish Solved Chess?
In the strong sense, no — chess has roughly positions in its game tree, so full solution is impossible. In the weak sense, essentially yes — no human beats Stockfish under any time control, including world champions (Ding Liren in 2024, Gukesh Dommaraju from 2025 onward).
Chapter 3 · Leela Chess Zero (Lc0) — Neural-Net Chess Engine
Leela Chess Zero (Lc0) was an open-source project started by people who read the AlphaZero paper (2017) and said, "Let us try to do that too." See lczero.org and github.com/LeelaChessZero/lc0.
How It Differs From Stockfish
| Item | Stockfish 17 | Leela Chess Zero (Lc0) |
|---|---|---|
| Search | alpha-beta + heuristics | MCTS (PUCT) |
| Evaluation | NNUE (small NN, CPU) | large NN (CNN / Transformer, GPU) |
| Hardware | CPU heavy, multi-core | GPU heavy, NVIDIA RTX 5090 popular |
| Training | none (only evaluator is trained) | trained from scratch via self-play |
| Nodes / sec | millions to tens of millions | tens to hundreds of thousands |
| Style | tactical, calculating | positional, intuitive |
Lc0 has overwhelmingly higher node efficiency (how much it understands per node). Stockfish may visit 10 million nodes per second; Lc0 may visit 100,000 — and they end up roughly comparable in strength. The reason is that the neural network knows in advance which moves are promising (policy net plus value net).
Training — Distributed Self-Play
Lc0 is a distributed self-play project where tens of thousands of volunteers donate GPU time. Each client plays a game and uploads the result; the result becomes training data. On an RTX 5090, you can play tens of games per hour, and the cumulative training game count is in the billions.
# Build Lc0 + grab a network weight file
git clone https://github.com/LeelaChessZero/lc0
cd lc0
./build.sh
# Weights are at https://lczero.org/play/networks/bestnets/
# BT5 or BT4 series are typically strong
Who Uses Lc0
- TCEC (Top Chess Engine Championship) — Stockfish's eternal rival.
- Carlsen, Karjakin, Caruana and other top human players for opening prep.
- ChessBase as a data source.
Chapter 4 · Komodo Dragon 3 — The Last Major Commercial Chess Engine
Komodo Dragon was created by Don Dailey and Larry Kaufman. In 2018, chess.com acquired it. As of 2026, the latest version is Komodo Dragon 3. It is a commercial engine (annual subscription) but it is the default engine of chess.com's analysis tools, so in practice it gets called hundreds of millions of times per day.
Features
- Adopted NNUE early (Dragon 1, 2021).
- Positional style — Kaufman is a former GM, so its moves often look human.
- Multi-PV analysis tends to produce variations humans can understand — useful for coaching.
- Slightly weaker than Stockfish but well inside the top 3.
Why Pay When Stockfish Is Free and Open?
- Commercial services like chess.com need stable licensing and support.
- For "teaching humans" analysis, Komodo's intuitive evaluations help.
- Book analysis on chess.com (Insights) uses Komodo as the standard.
Chapter 5 · AlphaZero to MuZero — The DeepMind Line
AlphaZero (2017) — One Algorithm, Three Games
Silver et al., 2017, "Mastering Chess and Shogi by Self-Play...".
- MCTS + deep neural network (policy + value).
- Trained purely by self-play — zero human games.
- Solved chess, shogi and Go with the same algorithm.
- After training, it beat Stockfish 8 28-0-72 over 100 games (as of 2017 — Stockfish later caught up with NNUE).
- Trained for a few days on 5,000 TPUs plus 64 TPUs.
What AlphaZero Changed
Before AlphaZero, chess engines encoded human chess knowledge by hand — pawn structure, king safety, doubled rooks, bishop pair, all written by ex-GM developers. AlphaZero threw all of that away and matched their level using only self-play. That was the shock.
MuZero (2019) — When You Do Not Know the Rules
Schrittwieser et al., 2019, "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model".
- AlphaZero knew the rules (it could compute the next board from a move).
- MuZero does not know the rules — the next board is predicted by the network itself.
- This means it generalizes to pixel-based games like Atari, so the same algorithm conquered both board games and Atari.
- Later extended to EfficientZero (2021) and Stochastic MuZero (2022).
Code
DeepMind has not released official AlphaZero code, but well-known reimplementations exist:
- OpenSpiel (github.com/deepmind/open_spiel) — DeepMind's official RL-game framework. Contains an AlphaZero base.
- muzero-general (github.com/werner-duvaud/muzero-general) — popular PyTorch implementation.
Chapter 6 · Maia — Human-Like Chess (MS Research + Toronto)
Most engines play the strongest move. Maia does the opposite: it plays what a human would play.
- Built by Microsoft Research + University of Toronto (Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg, Ashton Anderson).
- Paper: KDD 2020, "Aligning Superhuman AI with Human Behavior".
- GitHub: github.com/CSSLab/maia-chess.
How It Is Built
- Trained on lichess anonymous human game data — one model per ELO bucket.
- Variants: maia-1100, maia-1500, maia-1900, etc. — the number is the targeted ELO.
- AlphaZero-style CNN, but MCTS is just 1 node (only the policy network output).
- Result: maia-1500 plays the moves that 1500-rated humans most commonly play.
Why It Matters
- Chess coaching: matched to a student's ELO, "at your level, players usually play this move here".
- A small but real case in AI safety research: when you do not want "the strongest AI" but "an AI aligned with human behavior".
- Playing humans on lichess — Maia bots are among the most natural-feeling opponents you can play.
Chapter 7 · KataGo — The Apex of Distributed Go Training
If chess has Lc0, Go has KataGo.
- Developer: David Wu (started solo, grew via distributed training).
- GitHub: github.com/lightvector/KataGo.
- AlphaZero-style architecture, but with many training-efficiency improvements, so it is stronger for a given compute budget.
Stronger Than AlphaGo Zero?
- AlphaGo Master (2017) was much stronger than the Lee Sedol version of AlphaGo.
- AlphaGo Zero (2017, internal-only) was even stronger than Master.
- KataGo reproduced AlphaGo Zero's level in open source via distributed training, then surpassed it.
Improvements
- Score-based reward modeling — the network learns "by how many points" rather than just win/loss → human-friendly endgame moves.
- Multiple board sizes in one network (9x9, 13x13, 19x19).
- Handicap games and varied rule sets (Chinese / Japanese counting).
Who Uses KataGo
- Essentially every pro Go player uses it for analysis.
- Every major Go research institute in Korea, China and Japan runs KataGo.
- Popular GUIs: Lizzie, KaTrain, Sabaki.
And Leela Zero (Go)
Leela Zero was the distributed Go project before KataGo — the Go equivalent of Lc0. From 2017 to 2019, it reproduced AlphaGo Zero in open source. Volunteers later migrated to KataGo because it was more efficient, so Leela Zero is effectively retired. But it was the first public reproduction of AlphaGo Zero, which is a historic milestone.
Chapter 8 · AlphaGo — The 2016 Series
In 2026, AlphaGo is history — but a pivotal one.
AlphaGo Lineage
| Version | Year | Notes | Result |
|---|---|---|---|
| AlphaGo Fan | 2015 | CNN + MCTS, pre-trained on human games | 5-0 vs Fan Hui (European champion) |
| AlphaGo Lee | 2016 | Larger policy net, distributed inference | 4-1 vs Lee Sedol |
| AlphaGo Master | 2017.1 | Single network, partial self-play training | 60-game online streak, 3-0 vs Ke Jie |
| AlphaGo Zero | 2017.10 | Zero human games, self-play only | 89-11 vs Master |
| AlphaZero | 2017.12 | Same algorithm generalized to chess, shogi, Go | Beat Stockfish 8, Elmo, AlphaGo Zero |
Lee Sedol's Game 4, Move 78
March 13, 2016, game 4. Lee Sedol played move 78, the "divine move" (wedge between two AlphaGo stones). AlphaGo's evaluation function gave that move near zero probability, then misjudged the position, and Lee Sedol won. This is the last official win by a human against a top Go AI (as of 2025).
Lee Sedol retired in 2019, saying essentially that he saw no point in continuing a game he could not win. In Korea, AlphaGo is not just an AI event — it is remembered as "Lee Sedol's Game 4".
Chapter 9 · Pluribus — Conquering 6-Player Poker (Meta 2019)
Chess and Go are perfect-information; minimax works. Poker is different — you do not see opponent cards, there is luck, and bluffing is part of the game.
- Paper: Brown & Sandholm, 2019, "Superhuman AI for multiplayer poker" (Science).
- Built by Facebook AI Research (now Meta) + Carnegie Mellon University.
Core Algorithm — Monte Carlo CFR + Depth-Limited Search
- CFR (Counterfactual Regret Minimization): the standard learning algorithm for imperfect-info games. "How much would I regret not having taken this action?" is accumulated and used to update strategy.
- Blueprint strategy: a huge offline self-play training to learn a base strategy. About 8 days, 12,400 CPU cores.
- Real-time depth-limited search: during play, re-solve only a few moves deep. About 20 seconds per hand.
Why This Was a Shock
- Heads-up (2-player) poker had already been beaten by Libratus in 2017.
- 6-player is a different kind of problem — multi-agent, possible coalitions, side-betting. CFR has weak convergence theory there.
- Pluribus beat human top pros statistically significantly (13 pros, 10,000 hands) even without convergence guarantees.
- It ran on a roughly $1,000 / day cloud server — no supercomputer required, unlike AlphaZero.
Surprising Behaviors
- Randomized bet sizes — even for the same hand, bet amount varies → opponent cannot read it.
- Donk bets — a play human pros rarely used, used frequently by Pluribus.
- Game-theoretically optimal bluff frequency — not too often, not too rarely.
Chapter 10 · Cicero — Diplomacy (Meta, 2022)
Pluribus solved a "mathematically hard" game; Cicero solved a game that is hard because of language and human negotiation.
- Paper: Bakhtin et al., 2022, "Human-level play in the game of Diplomacy by combining language models with strategic reasoning" (Science).
- Built by Meta AI.
Why Diplomacy Is Hard
- Seven players move around a map of Europe, allying and betraying.
- Free-form chat negotiation each turn. What you reveal in chat and whom you ally with is the game.
- Lying is legal — promising an alliance and breaking it does not violate any rule.
- No dice, but you get asymmetric info + multi-player cooperation + language.
Cicero's Architecture
- Language model (LLM) — a 2.7B parameter BART fine-tuned on Diplomacy chat data.
- Strategy model — a policy network trained by self-play, RL-based.
- Intent inference → message generation → action decision — models its own intent and opponents' intent simultaneously.
Results
- In anonymous tournaments on webDiplomacy, top 10%, scoring twice the average human in 40 games.
- Never once flagged as "an AI" by human opponents — chatted naturally and won.
- Did not learn to lie on purpose — inconsistency breaks alliances. As a result, an "honest cooperator" was the stronger strategy.
This is more than a game AI win — it shows AI can handle natural language + strategy + multi-party negotiation, which is core to human society.
Chapter 11 · AlphaStar — StarCraft 2 (DeepMind 2019)
- Paper: Vinyals et al., 2019, "Grandmaster level in StarCraft II using multi-agent reinforcement learning" (Nature).
Why StarCraft 2 Is Hard
- Real time — no turns, tens of thousands of clicks per game.
- Partial observation — fog of war hides the opponent.
- Huge action space — millions of valid action combinations per frame.
- Long-horizon rewards — win or loss only at the end (tens of minutes).
- Three asymmetric races — Terran, Zerg, Protoss.
Algorithm
- Self-play RL plus a League system.
- Many "style" agents play each other; a new agent is trained to beat what the current champion cannot beat.
- This automatically discovers diverse metas.
- 14 days of training on 16 TPUs.
Results
- Reached Grandmaster (top 0.2%) on the Battle.net ladder.
- Beat human pros MaNa and TLO in series (5-0 in public exhibition matches).
- Showed non-human micro-control (200+ APM on multiple groups) and non-human strategy (constant parallel macro), both still legal.
Chapter 12 · OpenAI Five — Dota 2
- Blog: openai.com/research/openai-five.
- Built by OpenAI (2017-2019).
What Makes Dota 2 Harder
- 5 vs 5 team game — cooperation is essential.
- Longer time scale — average match is 45 minutes.
- 100+ heroes — even bigger action space.
- Long-term strategy (item builds, lane assignments, late-game team fights) plus short-term micro.
Results
- 2018 demo against OG, then 2-0 win over OG (the 2018-2019 world champion) in 2019.
- Trained on roughly 256 GPUs + 128,000 CPUs for 10 months.
- Cumulative training game time: about 45,000 years.
This was essentially a demonstration of industrial-scale distributed RL. The paradigm of "self-play + massive compute" that OpenAI Five established is what made OpenAI into OpenAI (later GPTs).
Chapter 13 · Suphx — Mahjong (Microsoft 2019)
Li et al., 2019, "Suphx: Mastering Mahjong with Deep Reinforcement Learning".
Why Mahjong Is Hard
- 4-player game (not 2).
- Hidden hand tiles plus random draws.
- Score system accumulates over a "hanchan" of 8 hands, not per hand → long-horizon decisions.
- Japanese riichi mahjong has complex rules (tenpai, yaku, dora, etc.).
Suphx's Approach
- Model: ResNet with gradient monitoring RL.
- Training tricks: global reward prediction (estimating how much an action in this hand contributes to the entire match).
- Run-time policy adaptation — fine-tune policy as the match progresses.
Result
- Reached "10-dan" on the Japanese online mahjong platform Tenhou — top 0.01%, on par with the strongest human players.
Chapter 14 · AlphaProof + AlphaGeometry — IMO Silver (2024)
Math proofs are not games, but they are essentially huge search problems. DeepMind solves them with game AI techniques.
AlphaGeometry (2024.1, Nature)
Trinh et al., 2024, "Solving olympiad geometry without human demonstrations".
- Specialized for plane geometry.
- A neural language model proposes auxiliary constructions, a symbolic engine verifies them.
- Solved 25 of 30 IMO 2000-2022 geometry problems (the average IMO gold medalist solves about 25.9).
AlphaProof (2024.7)
DeepMind blog: deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level.
- Writes proofs in Lean 4 (a formal proof language).
- Pipeline: natural language problem → formal statement → proof.
- Uses AlphaZero-style self-play RL to search proofs.
2024 IMO Result
- Out of 6 problems: 4 perfect scores (28/42 points).
- Silver-medal level (the 2024 silver cutoff was 29 points, so missed by one — effectively top 25%).
- Problems 1, 2 (AlphaProof) / 4 (AlphaGeometry) / 6 (AlphaProof). Problems 3 and 5 were not solved within the time budget.
Chapter 15 · Chess UIs — lichess / chess.com / ChessBase / Arena / Banksia / NIBBLER
No matter how strong the engine is, humans need a UI. As of 2026:
lichess.org — The FOSS Top
- Free, no ads, open source (AGPL v3).
- lichess.org, github.com/lichess-org/lila.
- Analysis engine: Stockfish by default — runs locally in the browser via WebAssembly.
- Heavy analysis is handled by fishnet, a volunteer distributed network.
- Plays well over 100 million games per month.
- In Korea, free plus fast servers have driven rapid adoption.
chess.com — The Commercial #1
- 50M+ monthly users.
- Analysis engines: Stockfish + Komodo Dragon (Komodo is chess.com property).
- Lesson library (GM courses), bot play, tournaments.
- Magnus Carlsen and other top GMs play chess.com's Speed Chess Championship.
ChessBase
- German company ChessBase's desktop database + engine tool.
- The de-facto standard for tournament preparation — Mega Database has 10M+ games.
- Engines (Fritz, Komodo, Stockfish) all attach via UCI.
- Expensive (€100/year and up) but mandatory in pro GM workflow.
Arena, Banksia, NIBBLER — for Engine Testing
- Arena (playwitharena.de) — classic free Windows chess GUI. UCI / XBoard engine connection standard.
- Banksia GUI (banksiagui.com) — a newer GUI. CCRL's informal standard.
- NIBBLER (github.com/rooklift/nibbler) — a GUI specialized for Lc0. Visualizes policy network outputs.
Chapter 16 · UCI and XBoard Protocols
There are two standard ways for engines and GUIs to talk.
UCI (Universal Chess Interface)
Created by Stefan Meyer-Kahlen in the late 1990s. Almost every modern engine speaks UCI.
# GUI -> engine
uci # tell engine to enter "UCI mode"
setoption name Threads value 8
isready
position startpos moves e2e4 e7e5
go wtime 60000 btime 60000
# engine -> GUI
id name Stockfish 17
uciok
readyok
info depth 20 score cp 31 ...
bestmove g1f3 ponder b8c6
XBoard / CECP
Much older (early 1990s). Some classic engines (Crafty, GNU Chess) still use it. lichess supports XBoard-format bots.
Differences
| Item | UCI | XBoard / CECP |
|---|---|---|
| Origin | late 1990s | early 1990s |
| Time control | GUI sends times | engine tracks its own clock |
| Options | uniform setoption | engine-specific |
| Share | dominant | legacy |
A new engine today is almost always built UCI-first.
Chapter 17 · Korea — NCsoft's Hancho, and Lee Sedol
Hancho (NCsoft)
Built by NCsoft's AI Center. First shown in 2017. In December 2019, Lee Sedol played Hancho as his retirement series: he won game 1, lost games 2 and 3, finishing 1-2.
- Lee Sedol won game 1 with an aggressive left-side invasion at move 78, and Hancho mis-evaluated the position.
- This is the last recorded win by Lee Sedol against an AI in an official series (as of 2025).
Hancho stayed inside NCsoft as internal research; it was never widely released as a public analysis tool. NCsoft has since shifted its game AI work toward NPC behavior (Lineage), RL-driven content generation, and so on.
LG, Kakao — Korean Go AIs
- LG also built an in-house Go AI in the late 2010s, less visible than Hancho.
- Kakao Brain experimented with their own Go AI (code-named things like Kataja) for a while, then shifted toward KataGo open-source contributions.
What Go AI Meant in Korea
Lee Sedol vs AlphaGo is the event that made "AI" a household word in Korea. The frequency of the word "AI" in Korean media before and after March 2016 is qualitatively different. Korea's national AI policy (the 2019 AI National Strategy) was a direct consequence.
Chapter 18 · Japan — Shogi AI History, dlshogi, Yaneura-ou
Shogi is Japanese chess, with the extra twist that captured pieces can be reused. This makes the game tree much larger than chess. The Japanese computer shogi community has been very active since the 1990s.
Major Engines (Chronological)
| Engine | Year | Notable Fact |
|---|---|---|
| Gekisashi | 1990s | First strong Japanese shogi engine |
| Bonanza | 2005 | Origin of ML-based evaluation function — Kunihito Hoki |
| GPS Shogi | 2009 | University of Tokyo GPS group |
| Ponanza | 2013-17 | First to beat the human Meijin (2013) |
| Apery | 2014 | Open source |
| Yaneura-ou | 2015- | Current Japanese standard engine — birthplace of NNUE |
| dlshogi | 2018- | AlphaZero-style NN, trained on RTX 5090 |
The Bonanza Shock — The Bonanza Method
Hoki's 2006 paper — obtain evaluation function weights via optimization learning on professional games. This is about 10 years earlier than chess NNUE — it is the origin of ML-based evaluation. Stockfish's NNUE was later influenced.
Yaneura-ou — The Birthplace of NNUE
Built by Motohiro Isozaki ("Yaneura"), open-source shogi engine. The first to make NNUE practical. Stockfish later imported it to chess. Most winners of the World Computer Shogi Championship today are Yaneura-ou variants.
dlshogi — AlphaZero for Shogi
GitHub: github.com/TadaoYamaoka/DeepLearningShogi.
- AlphaZero-style — CNN + MCTS, self-play.
- Yaneura-ou (NNUE) and dlshogi (deep learning) are two different paths to similar strength.
- Won the 2021 World Computer Shogi Championship.
- A pair of RTX 4090 or RTX 5090 GPUs is the de-facto reference setup in Japanese training discussions.
Humans vs Shogi AI — Meijin-sen and NHK Cup
- Meijin-sen: the most prestigious shogi title.
- 2013: Ponanza beat Akihito Hosaka and other top pros → consensus shifted to "official human-vs-AI series no longer make sense".
- NHK Cup: a fast-play exhibition event sometimes features AI participation (an event, not an official title match).
Chapter 19 · Who Should Learn Game AI?
1) RL Researchers
- AlphaZero, MuZero, AlphaStar, Cicero are the RL textbook.
- OpenSpiel, RLlib, JAX/Acme let you experiment hands-on.
- Games are clean environments, so they are almost the standard RL benchmark.
2) Board-Game Engine Builders
- Stockfish, Lc0, KataGo show how far you can optimize a single game.
- A good entry point if you want serious C++ / CUDA depth.
3) Multi-Agent / Negotiation AI
- Cicero and Pluribus are the academic standard for multi-player + cooperation + natural language.
- Required reading if you build LLM agents that negotiate.
4) Game Companies
- In-game bots, matchmaking and content generation are increasingly RL-driven.
- Examples: NCsoft Hancho, OpenAI Five.
5) Education / Coaching
- Maia, plus coaching bots in chess.com / lichess.
- The job of making a "human-like" opponent matched to a student's ELO.
Chapter 20 · Closing — What "Superhuman" Now Means
Game AI in 2026 outperforms humans in essentially every standard game. Chess, Go, shogi, heads-up and multiway poker, StarCraft 2, Dota 2, mahjong, Diplomacy — even the International Math Olympiad at silver-medal level.
But this is not the end. New games — MMO PvE dungeon clearing, discovering new hero metas in MOBAs, meta exploration right after a new TCG set drops — are still active research areas.
A more interesting direction is "AI that is like a human" — Maia, Cicero. Not just stronger AI, but AI that plays with humans, that humans can understand, and that can teach humans.
Game AI is not finished. We have simply entered an era where "winning" is no longer the goal.
References
- Silver, D., et al. (2016). "Mastering the game of Go with deep neural networks and tree search." Nature.
- Silver, D., et al. (2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm." arXiv:1712.01815.
- Schrittwieser, J., et al. (2019). "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero)." arXiv:1911.08265.
- Brown, N., & Sandholm, T. (2019). "Superhuman AI for multiplayer poker (Pluribus)." Science 365 (6456): 885-890.
- Bakhtin, A., et al. (2022). "Human-level play in the game of Diplomacy by combining language models with strategic reasoning (Cicero)." Science.
- Vinyals, O., et al. (2019). "Grandmaster level in StarCraft II using multi-agent reinforcement learning (AlphaStar)." Nature.
- OpenAI Five blog (2018-2019).
- Li, J., et al. (2020). "Suphx: Mastering Mahjong with Deep Reinforcement Learning." arXiv:2003.13590.
- Trinh, T., et al. (2024). "Solving olympiad geometry without human demonstrations (AlphaGeometry)." Nature.
- DeepMind blog. "AI achieves silver-medal standard solving IMO problems (AlphaProof + AlphaGeometry, 2024)."
- McIlroy-Young, R., et al. (2020). "Aligning Superhuman AI with Human Behavior (Maia)." KDD.
- Stockfish — GitHub repository.
- Leela Chess Zero — Project site.
- KataGo — David Wu's repository.
- dlshogi — Yamaoka Tadao's repository.
- Yaneura-ou — Yaneura's shogi engine.
- lichess.org source code (lila).
- chess.com.
- ChessBase Mega Database.
- Arena Chess GUI.
- Banksia GUI.
- NIBBLER GUI for Lc0.
- DeepMind OpenSpiel.
- Wikipedia: AlphaGo versus Lee Sedol.