- Published on
Chinese AI Labs in 2026 — A Deep Dive into DeepSeek, Qwen, Kimi, GLM, Yi, Doubao, Hunyuan (The New Center of Gravity for Open Weights)
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Prologue — That weekend in January 2025
Some people remember the last weekend of January 2025. Within three days of DeepSeek-R1's release, NVIDIA's market cap shed roughly $600 billion. American media called it a "Sputnik moment" and Silicon Valley meetings ran late into the night. Meanwhile, a small lab in Hangzhou owned by the hedge fund High-Flyer just quietly pushed weights and a paper to Hugging Face, the way it always did.
That was 16 months ago. It is May 2026 now, and the landscape of Chinese AI labs has shifted in ways that are still being absorbed. The center of gravity for open-weight SOTA has clearly moved east. While Meta's Llama line stalled and Mistral pivoted closed, DeepSeek-V3/R1, Qwen 3, Kimi K2, and GLM-4.5 used weights themselves as the weapon and rattled the global standard.
This post draws the full map of Chinese AI labs as of May 2026 — the Six Tigers (六小虎), the BAT (Baidu, Alibaba, Tencent), plus DeepSeek, ByteDance, and Huawei. Underneath that we cover export controls, domestic chips, inference stacks, and evaluation benchmarks. Everything a Korean or Japanese engineer needs when their company asks, "Can we try a Chinese model?"
1. The 2026 Map — Six Tigers + BAT + Huawei
One-line summary first. Of the six "tigers" (六小虎) that defined the 2024 startup narrative, only about four are meaningfully alive in 2026. Those four are competing directly for global SOTA, while BAT, DeepSeek, ByteDance, and Huawei run their own models alongside.
The coordinates as of May 2026.
| Bucket | Company | Flagship (May 2026) | License | Trait |
|---|---|---|---|---|
| Six Tigers | Zhipu AI (智谱) | GLM-4.5 / GLM-4.5V | Partial open | Agentic, multimodal |
| Six Tigers | Moonshot (月之暗面) | Kimi K2 (1T MoE) | Open weights | 1M long context |
| Six Tigers | MiniMax (稀宇科技) | MiniMax-01 / -M1 | Open weights | 4M context |
| Six Tigers | 01.AI (零一万物) | Yi-Large / Lightning | Partial open | Restructured 2025 |
| Six Tigers | Baichuan (百川) | Baichuan-M1 | Partial open | Pivoted to healthcare |
| Six Tigers | StepFun (阶跃星辰) | Step-2 / Step-R | Partial open | Multimodal |
| BAT | Alibaba (阿里) | Qwen 3 / Qwen3-Coder | Open weights | Best open standard |
| BAT | Baidu (百度) | ERNIE 4.5 / X1 | Closed | Search integration |
| BAT | Tencent (腾讯) | Hunyuan T1 / Turbo | Partial open | Reasoning entry |
| Indie | DeepSeek (深度求索) | V3 / R1 / V4 / R2 | Open weights | Global shock |
| Indie | ByteDance (字节) | Doubao 1.5 / Seedream | Partial open | #1 deployment |
| Ant | Ant Group (蚂蚁) | Ling / Bailing / Ming | Partial open | Finance focus |
| Huawei | Huawei (华为) | Pangu / Pangu-Sigma | Partial open | Chip + model vertical |
You do not need to memorize the table. Two patterns are enough.
First, the 2024 "six unicorns" framing is almost useless in 2026. Baichuan effectively dropped out of the general LLM race and went healthcare-specific. 01.AI restructured in late-2024 to early-2025 and cut the pretraining business. Meanwhile DeepSeek, which was never one of the tigers, became global tier-1, and Qwen moved the fastest inside BAT.
Second, "open weights" means different things at different companies. DeepSeek and Qwen ship under MIT or Apache 2.0-like terms. Kimi K2 ships under a modified MIT that permits both research and commercial use with some constraints. GLM has different licenses per model size. Yi splits academic and commercial. Pangu calls itself open but requires application and approval. If you are evaluating for production, read the actual license text before anything else.
2. DeepSeek-V3 / R1 — Epicenter of the 2024-2025 Shock
DeepSeek first. Company name 深度求索, based in Hangzhou, spun out of the hedge fund High-Flyer Capital (幻方量化) in July 2023. Founder Liang Wenfeng (梁文锋) had quietly stockpiled roughly 10,000 NVIDIA A100s for quant trading and decided to point them at LLMs.
Timeline.
- 2023.11: DeepSeek LLM 7B/67B — first models, unremarkable
- 2024.05: DeepSeek-V2 (236B MoE) — introduced MLA (Multi-Head Latent Attention), cut inference cost roughly 90 percent
- 2024.06: DeepSeek-Coder-V2 — coding specialty
- 2024.12: DeepSeek-V3 (671B MoE, 37B active) — GPT-4o-class quality at roughly $5.6M training cost
- 2025.01: DeepSeek-R1 (reasoning) — OpenAI o1-class, open weights
- 2025.05: DeepSeek-V3.1 / R1-0528 — context extension, tool use
- 2025.12: DeepSeek-V4 (expected, vaporware status varies by week)
- 2026.03: DeepSeek-R2 — multimodal plus agentic reasoning
DeepSeek-V3 shocked the market for two reasons. First, fine-grained MoE with 671B total but only 37B active. You pay the inference cost of a 37B model but reach into the knowledge of a 671B one. Second, training finished in roughly two months on 2,048 H800 GPUs. Engineering details — MoE design, FP8 mixed precision, DualPipe pipeline parallelism, multi-token prediction — were all published in the paper.
R1 added reasoning on top using GRPO (Group Relative Policy Optimization). PPO without the critic network, swapped for a group baseline, dropped memory by more than half. The reported reasoning training cost ended up roughly one-tenth of what OpenAI was rumored to spend.
# Bring up DeepSeek-V3 with vLLM (minimal example)
# pip install vllm
from vllm import LLM, SamplingParams
llm = LLM(
model="deepseek-ai/DeepSeek-V3",
tensor_parallel_size=8, # 8x H100
trust_remote_code=True,
dtype="bfloat16",
max_model_len=65536,
)
prompts = ["Explain the time complexity of the following:\n\nfor i in range(n):\n for j in range(n):\n a[i][j] = i*j"]
params = SamplingParams(temperature=0.6, max_tokens=2048)
outputs = llm.generate(prompts, params)
print(outputs[0].outputs[0].text)
Field notes. DeepSeek-V3 runs best on vLLM 0.7+ or SGLang 0.4+. TensorRT-LLM support for V3 MoE was still beta close to V4 release. When using R1 for production reasoning, set max_tokens generously (8K to 16K). R1 is expected to produce a long thinking trace, and clipping breaks the answer.
As of May 2026, DeepSeek cut API prices again. Input with cache hit at roughly 1.10. That is one-tenth of GPT-4.1 mini and roughly one-fifth of Claude Haiku. For Korean and Japanese companies that can clear the security review, DeepSeek API has become one of the most cost-effective options on the market.
3. Qwen 3 (Alibaba) — The New Open-Weight Standard
Next, Alibaba Qwen. Formally called Tongyi Qianwen (通义千问), built by the DAMO Academy. Started with Qwen-7B in August 2023 and has shipped roughly one new series per quarter ever since. Effectively a model factory.
Timeline.
- 2023.08: Qwen-7B / 14B
- 2024.02: Qwen 1.5 — full size sweep from 0.5B to 72B
- 2024.06: Qwen 2 — Apache 2.0 (7B / 57B-A14B / 72B)
- 2024.09: Qwen 2.5 — coding and math improvements
- 2025.04: Qwen 3 — dual thinking and non-thinking modes
- 2025.06: Qwen3-Coder including 235B-A22B — coding SOTA
- 2025.09: Qwen3-VL — multimodal
- 2026.02: Qwen 3.5 (informal; merging with Qwen-Max)
The biggest design decision in Qwen 3 was bundling thinking and non-thinking modes inside a single model. Pass enable_thinking=True and it runs a long chain like R1; pass False and it answers directly. That sounds minor but for operations it halves the cost of "running a reasoning model and an instruct model side by side."
The size lineup is also clean. 0.6B then 1.7B then 4B then 8B then 14B then 32B, plus 235B-A22B MoE and 480B-A35B MoE for Qwen3-Coder. The 0.6B / 1.7B sizes run on a laptop with ollama. The 32B fits on a single H100, and the 235B needs roughly 8x H100.
# Run Qwen 3 8B locally with ollama
ollama pull qwen3:8b
ollama run qwen3:8b "Implement an LRU cache in Python"
# Pull Qwen3-Coder 30B-A3B from ModelScope
# pip install modelscope
modelscope download Qwen/Qwen3-Coder-30B-A3B-Instruct \
--local-dir ./qwen3-coder-30b
License. Qwen 3 ships under Apache 2.0. You can fine-tune, ship closed-source, and charge money. That is friendlier than DeepSeek-V3's modified MIT, and several Korean and Japanese SaaS companies are already fine-tuning Qwen 3 as the base and selling it as their own model. (Whether that is honest marketing is another question; legality is not in doubt.)
Performance. As of May 2026, Qwen3-235B-A22B trades positions with GPT-4.1 and Claude 3.7 Sonnet in LMSys Chatbot Arena. The tokenizer through Qwen 2 was worse than Llama for Korean, but Qwen 3 retrained the BPE and improved Korean encoding efficiency by roughly 30 percent. Japanese is still slightly behind GPT-4o and Claude 3.5 Sonnet.
Alibaba Cloud's model hub, ModelScope (魔搭), is effectively the Chinese mirror of Hugging Face. Hugging Face downloads are blocked from inside mainland China, so Chinese-company models tend to land on both. Outside China, Hugging Face is usually faster, but some weights (especially right after RLHF release) are temporarily ModelScope-only.
4. Kimi K2 (Moonshot) — Long-Context 1M Champion
Moonshot AI (月之暗面) was founded by Yang Zhilin (杨植麟) during his PhD at Tsinghua. Series B funding came from Alibaba and Tencent. The whole product positioning hinged on long context from day one.
Timeline.
- 2023.10: Kimi Chat — initial 200k Chinese-character window
- 2024.03: Kimi 1.5 — pushed to roughly 2 million characters, about 200K tokens
- 2024.10: Kimi K0 reasoning beta
- 2025.07: Kimi K2 — 1T parameter MoE, 32B active, modified MIT license
- 2025.11: Kimi K2-Coder
- 2026.02: Kimi K2.5 — 1.5M context, agentic
- 2026.05: Kimi K3 (expected)
The Kimi K2 design echoes DeepSeek-V3 — fine-grained MoE — but with even smaller active parameters (32B). The 1T headline number is great marketing; what matters in practice is the active 32B inference cost. Real serving needs roughly 8x H200 (about 1.1TB HBM) or 4x B200, because 8x H100 (640GB) does not fit the full weights.
K2's real strength is agentic tool use. K2 mixed tool-calling data into base pretraining, and the resulting function-call accuracy is GPT-4.1 level. Combined with the long-context window, "read a 200-page PDF and make 50 tool calls" workflows put K2 a step above other open models.
# Long PDF processing with Kimi K2 (Moonshot official SDK)
# pip install moonshot
from moonshot import OpenAI
client = OpenAI(api_key="sk-...", base_url="https://api.moonshot.cn/v1")
# Upload the PDF
with open("long-paper.pdf", "rb") as f:
file = client.files.create(file=f, purpose="file-extract")
content = client.files.content(file_id=file.id).text
# Ask in one shot using a 1M context
response = client.chat.completions.create(
model="moonshot-v1-128k", # or kimi-k2
messages=[
{"role": "system", "content": "You are a helpful research assistant."},
{"role": "system", "content": content},
{"role": "user", "content": "Summarize the three key contributions of this paper in English."},
],
temperature=0.3,
)
print(response.choices[0].message.content)
Trade-offs. K2's Korean and Japanese are a step behind Qwen 3. The training data leans heavily Chinese, so the Hanzi vocabulary is strong but Korean honorific consistency and Japanese keigo lag behind GPT-4o, Claude, and Qwen 3. On the other hand, long-context retrieval accuracy (NIAH "needle in a haystack") in the 1M regime sits just above GPT-4.1 and near Gemini 2.5 Pro.
Business side. Kimi Chat was briefly the #1 consumer chat app in China in 2024. Then ByteDance Doubao spent billions of yuan on advertising and overtook it on MAU. Moonshot pivoted toward B2B and model licensing starting in 2025.
5. GLM-4.5 (Zhipu) — Agentic and Multimodal
Zhipu AI (智谱AI) was spun out of Tsinghua's KEG lab. The GLM (General Language Model) series has shipped open weights since 2021, making Zhipu the most academic of the Six Tigers.
Timeline.
- 2022.10: GLM-130B — first 100B-class open weights, bilingual Chinese/English
- 2023.03: ChatGLM-6B — the open Chinese model most widely known to general developers
- 2024.01: GLM-4 (API only)
- 2024.06: GLM-4-9B (open)
- 2025.04: GLM-4.5 — agentic specialization
- 2025.06: GLM-4.5V — vision
- 2025.10: GLM-4.5-Air — small open variant
- 2026.03: GLM-5 (expected)
GLM-4.5's positioning is "agentic." Not plain chat, but multi-step tool use, web browsing, and code execution all mixed into base pretraining. As a result, on agent benchmarks like GAIA and SWE-bench it sits a step above other Chinese open models. The realistic comparisons are Claude Sonnet 4 and GPT-4.1.
Licensing is the messiest piece. GLM-4-9B (2024) allowed both academic and commercial use, and GLM-4.5-Air follows similar terms, but GLM-4.5 itself is API-only. You cannot describe "GLM is open" with a single sentence. For corporate use, send the actual license text to legal, no exceptions.
# Run GLM-4.5-Air via transformers
pip install transformers torch
python -c "
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
mid = 'THUDM/glm-4-9b-chat'
tok = AutoTokenizer.from_pretrained(mid, trust_remote_code=True)
mdl = AutoModelForCausalLM.from_pretrained(mid, torch_dtype=torch.bfloat16, device_map='auto', trust_remote_code=True)
inputs = tok.apply_chat_template(
[{'role': 'user', 'content': 'Explain reinforcement learning in five sentences'}],
add_generation_prompt=True,
return_tensors='pt',
).to(mdl.device)
out = mdl.generate(inputs, max_new_tokens=512)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
"
Zhipu's other major asset is the CogVLM, CogVideoX, and CogView lines. CogVideoX was the first major open-weight video generation model when released in August 2024. As of May 2026 the line includes CogVideoX-5B, 5B-I2V, and Pro. It does not match Sora or Veo 3, but among open-weight video models it is the de facto standard.
6. Yi-Large / 01.AI (Kai-Fu Lee) — After the 2025 Restructure
01.AI (零一万物) was founded by Kai-Fu Lee (李开复) in 2023. Given Lee's profile — Microsoft Research, Google China, Sinovation Ventures — the company drew attention from the start.
Timeline.
- 2023.11: Yi-34B — first model, claimed top non-English performance
- 2024.01: Yi-VL — multimodal
- 2024.05: Yi-1.5 — 6B / 9B / 34B open
- 2024.10: Yi-Lightning — API model, entered Chatbot Arena
- 2025.01: Pretraining business reorganized, partial sale to Alibaba
- 2025.06: Yi-Large 2 (API only; effectively scaled down)
- 2026.05: Focused on industrial applications — digital humans, call centers, enterprise search
The early-2025 restructuring of 01.AI closed a chapter in the Chinese AI industry. Kai-Fu Lee publicly stated that only one or two of the Six Tigers would survive the 10-billion-yuan pretraining race and admitted his company would not be among them. The pretraining team and a chunk of the GPU pool moved to Alibaba.
So is Yi dead? No. It pivoted to the application layer. Yi-Lightning is still served via API, but the strategy now is specialization in digital humans (万知), call-center automation, and enterprise search, with B2B revenue as the growth lever. As of 2026, Yi's SaaS revenue exceeds its model-licensing revenue.
For open-weight shoppers, does Yi still matter? Yes for 2024-era releases, no for anything later. Yi-1.5-34B was a popular fine-tuning base for Korean and Japanese teams in 2024. In 2026, Qwen 3 32B, DeepSeek-V3, and GLM-4.5-Air are better starting points.
7. Doubao (ByteDance) — Winning on Deployment Scale
ByteDance (字节跳动) Doubao (豆包) is not a Six Tigers member. It is, however, the #1 Chinese model by users and daily call volume. ByteDance owns TikTok, Toutiao, and CapCut and embeds Doubao across all of them. It also spent billions of yuan on advertising in 2024 and 2025.
Timeline.
- 2023.08: Doubao 1.0 — first model
- 2024.05: Doubao Pro — kicked off the price-cut campaign
- 2025.01: Doubao 1.5 Pro — multimodal
- 2025.05: Seedream — image generation
- 2025.09: Doubao 1.5 Pro 32k / 256k
- 2025.12: Doubao 1.5 Thinking — reasoning
- 2026.02: Doubao 2.0 (informal; multimodal unification)
Doubao's design philosophy is simple. "Serve an average-good-enough model as cheaply as possible at the largest possible scale." No attempt to beat GPT-4o, Claude, or DeepSeek on raw quality. Instead, the cheapest API on the market via ByteDance's Volcano Engine (火山引擎). The "1 yuan for 1 million tokens" announcement in May 2024 kicked off the price war among Chinese LLMs.
On open weights. Doubao itself is closed, but the ByteDance Seed team separately ships Seed-OSS, BAGEL (multimodal), and Seedream-2 as open-weight models. So even without Doubao itself, the Seed line is downloadable.
Will Korean or Japanese developers ever use Doubao? Almost never. The Doubao API is optimized for mainland Chinese IPs, and the data-handling policies make it a hard sell to foreign companies. That said, CapCut and TikTok embed Doubao in their consumer AI features, which means Korean and Japanese end users are already touching it indirectly.
8. Hunyuan / T1 (Tencent)
Tencent (腾讯) Hunyuan (混元) was the slowest of the BAT to enter the LLM race. Official announcement in September 2023, closed for some time, partial open-weight releases starting in 2024.
Timeline.
- 2023.09: Hunyuan 1.0 (API)
- 2024.05: Hunyuan-Large 389B MoE — first open-weight release
- 2024.11: Hunyuan-Vision
- 2025.03: Hunyuan T1 — reasoning model with hybrid Mamba-Transformer
- 2025.07: Hunyuan-Turbo
- 2025.10: Hunyuan-Vision-2
- 2026.01: Hunyuan T2 (expected)
The most interesting thing about Hunyuan T1 is its hybrid Mamba-Transformer architecture. Replacing a subset of layers with Mamba/SSM makes long-context decoding 2 to 3 times faster than pure Transformer. That directly lowers the cost of "producing a long thinking trace" in reasoning workloads. The trade-off is slightly worse retrieval accuracy on benchmarks like NIAH compared to pure Transformer baselines.
Tencent's real asset is integration with WeChat (微信). Hunyuan is embedded across WeChat search, mini-programs, and customer support. Many analysts argue the "1 billion user channel" is worth more than absolute model quality.
On open weights, Hunyuan-Large 389B ships under the "Tencent Hunyuan Community License," which permits commercial use if you have under 100 million monthly active users — practically free for almost every company. (Same pattern as Meta Llama.)
9. Ling / Ming (Ant Group, Alipay)
Ant Group (蚂蚁集团) is the parent company behind Alipay. As an Alibaba spinout in financial services, its LLM strategy is heavily tied to the finance domain. The naming is confusing — here is the May 2026 cheat sheet.
- Bailing (百灵): Ant's primary LLM series (Bailing-7B, Bailing-Pro)
- Ling (铃): Lightweight and on-device line (Ling-Tiny, Ling-Plus, Ling-Lite)
- Ming (鸣): Multimodal
- AntFin / AntGLM: Finance-specific (loan underwriting, call centers, KYC)
Ling-Plus made headlines in March 2025 as one of the first major open-weight models pretrained entirely on Chinese domestic GPUs — Huawei Ascend and Cambricon — without NVIDIA. Absolute quality is below Qwen 3, but the political and strategic message of "we can do it on domestic chips" landed hard.
Ant Group models are rarely something Korean or Japanese developers touch directly. But Korean and Japanese e-commerce backends that accept Alipay may already run KYC and fraud detection through Ant models without explicit awareness.
10. Step / StepFun and MiniMax
The remaining two tigers.
StepFun (阶跃星辰) was founded by Jiang Daxin (姜大昕), a former Microsoft global VP. The differentiator is multimodal. Step-2 (an estimated 1T-class model) shipped in January 2025; Step-R is reasoning, Step-1V is vision, Step-1X-Edit is image editing. StepFun is the smallest of the Six Tigers and has fielded fundraising rumors throughout 2026.
MiniMax (稀宇科技) was founded in 2021 and was the fastest tiger to push into consumer products. Talkie is its character-chat app aimed at the US market; Hailuo is its video generator. The flagship models:
- MiniMax-Text-01: 456B MoE, 4M context (announced January 2025)
- MiniMax-VL-01: vision
- MiniMax-M1: hybrid attention reasoning (June 2025)
- MiniMax abab series: smaller line
The 4M-token context window in MiniMax-Text-01 is still the largest among open-weight models as of May 2026. It uses lightning attention (a linear-attention variant) to keep memory tractable. Honest disclosure: NIAH testing shows retrieval accuracy beyond 1M starts to degrade.
For Korean and Japanese audiences, the relevant MiniMax product is Talkie. It is one of the more popular Character.AI alternatives among English-speaking teens, and its ML backend is MiniMax abab.
11. Export Controls and Chips — H100/B200 to Huawei Ascend and Cambricon
Now into infrastructure. The real variable that shapes the fate of Chinese AI labs is chips, not models.
Timeline of US export controls.
- 2022.10: Direct ban on H100/A100 exports (BIS Entity List plus ECCN)
- 2023.10: H800/A800 China variants also banned
- 2024.10: Additional restrictions on H20 (further-downgraded variant)
- 2025.04: B200/B300 effectively banned
- 2025.10: GB200 NVL72 system exports banned
- 2026.02: Attempts to classify AI model weights themselves under ECCN
The result is that the NVIDIA GPUs Chinese companies can legally access in May 2026 are previously purchased H100/H800/A100/A800 inventory plus some H20. Gray-market channels via Singapore or Malaysia exist persistently in rumor, but the scale is limited.
To compensate, China has invested heavily in domestic silicon.
Huawei Ascend 910 series.
- 910B: mass production 2023, roughly 320 TFLOPS FP16, A100 class
- 910C: mass production late 2024, roughly 800 TFLOPS FP16, claimed H100 class
- 910D: late 2025 to early 2026, claimed B200 class
- CloudMatrix 384: 384-card Ascend system with optical interconnect, GB200 NVL72 alternative
Huawei's real strength is the CloudMatrix, MindSpore, and CANN full stack more than the chip itself. One vendor delivers model, runtime, drivers, and hardware. Setup costs more hands-on work than NVIDIA, but once running, workloads are nearly independent from foreign suppliers.
Cambricon (寒武纪) MLU series.
- MLU370: 2022, inference focused
- MLU590: 2024, training and inference
- MLU690: 2025, claimed roughly H100 parity on inference
Cambricon is not as vertically integrated as Huawei. Major inference frameworks (vLLM, SGLang) only began first-class support in 2025, so the adoption barrier is higher.
Catalog numbers do not tell the whole story. Anonymous reports place Ascend 910C at roughly 50 to 70 percent of H100 throughput on real training workloads. Pricing is 30 to 50 percent of gray-market H100 prices, so the TCO comes out comparable or cheaper once you account for power, rack, and software overhead. Training stability (compared to NCCL) and driver maturity still favor NVIDIA decisively.
# Serve a model on Ascend (Huawei MindIE-LLM offers a vLLM-like OpenAI-compatible API)
pip install mindie # Ascend environments only
mindie serve --model qwen3-32b --device-list 0,1,2,3 \
--max-input-token-len 32768 --max-batch-size 32 \
--port 8000
12. Inference Stack — vLLM, LMDeploy, FastGen, ModelScope
The Chinese inference stack overlaps with the US one by about 70 percent and diverges on the other 30 percent. The overlap first.
Shared: vLLM, SGLang, TensorRT-LLM, Hugging Face Transformers, DeepSpeed-MII. All five are used directly in China. vLLM in particular gets PRs from the DeepSeek and Qwen teams, so Chinese model support lands fast.
China-specific inference stacks.
- LMDeploy (Shanghai AI Lab): inference server from the InternLM team. Similar to vLLM, with a TurboMind backend optimized for INT4 quantization. Ascend support landed before vLLM.
- FastGen (Microsoft Research Asia): a Chinese-origin fork of DeepSpeed-FastGen. Token-level dynamic batching.
- Xinference (Xorbits): a wrapper over vLLM and LMDeploy that exposes an OpenAI-compatible API. Standard for smaller Chinese companies.
- MindIE-LLM / MindIE-Service (Huawei): Ascend-only, OpenAI-compatible.
ModelScope (魔搭) vs Hugging Face. ModelScope is Alibaba's hub; Chinese-company models often release on both simultaneously, sometimes ModelScope first. Outside China, Hugging Face is faster, but some weights (especially right after RLHF release or under Chinese-only licenses) appear only on ModelScope.
# Pull a model from ModelScope
from modelscope import snapshot_download
# Qwen3-Coder-30B
model_dir = snapshot_download(
"Qwen/Qwen3-Coder-30B-A3B-Instruct",
cache_dir="./models",
)
print(f"downloaded to {model_dir}")
# DeepSeek-V3 is also on ModelScope
ds_dir = snapshot_download("deepseek-ai/DeepSeek-V3", cache_dir="./models")
13. Evaluation — SuperCLUE, OpenCompass, C-Eval
Chinese model evaluation benchmarks. Separate from English-language MMLU, GPQA, and SWE-bench, there is a Chinese ecosystem.
- C-Eval (Tsinghua University): 13,948 multiple-choice questions across 52 subjects. The most standard Chinese LLM benchmark.
- CMMLU: 11,528 questions, MMLU's Chinese counterpart. Similar to C-Eval with a different subject distribution.
- OpenCompass (Shanghai AI Lab): a meta-benchmarking platform aggregating 100+ datasets into a leaderboard.
- SuperCLUE (independent organization): the leaderboard most quoted by Chinese press, updated monthly.
- GAOKAO-Bench: based on China's college entrance exam, measures reasoning.
- AGIEval: based on academic exams across Chinese and English jurisdictions.
Should Korean and Japanese teams trust these benchmarks? Use them for reference, not for direct selection. The top model on C-Eval or SuperCLUE rarely tops Korean or Japanese leaderboards. Korean teams should consult KoBEST, KMMLU, and HAERAE; Japanese teams should check JCommonsenseQA, JGLUE, and the Nejumi leaderboard. That said, if a model ranks high on GAOKAO or MATH, the reasoning capability often transfers to Korean and Japanese reasoning tasks.
Approximate SuperCLUE top order as of May 2026.
- GPT-4.5 / Claude Opus 4 (global closed, reference)
- DeepSeek-R2
- Qwen3-Max
- GLM-4.5
- Kimi K2.5
- Hunyuan T1
- Doubao 1.5 Pro Thinking
If you restrict to open weights, DeepSeek-R2, Qwen3-235B, and Kimi K2 are essentially tied.
14. Using Chinese Open Models in Korea and Japan
The most practical question. "Can our company use Chinese open models, and if so, how?"
Security and legal perspective.
- Model weights are just matrices of numbers. Download the weights, run them on your own server, and no data goes to China. That is the opposite of the OpenAI or Anthropic API model.
- Using a Chinese-hosted API sends data to Chinese servers. DeepSeek API, Qwen API, and Moonshot API are operated from mainland China. For Korean and Japanese companies, that runs into PIPA, GDPR, and financial regulations. Mandatory legal review before deployment.
- Alibaba Cloud's Singapore region offers Qwen API hosted in Singapore, explicitly not routed through mainland China. Global enterprises prefer that path.
Korean and Japanese performance, May 2026 subjective scores.
| Model | Korean (everyday) | Korean (coding) | Japanese (everyday) | Japanese (keigo) |
|---|---|---|---|---|
| Qwen3-235B | 4.0/5 | 4.5/5 | 3.5/5 | 3.0/5 |
| DeepSeek-V3 | 3.5/5 | 4.5/5 | 3.5/5 | 3.0/5 |
| Kimi K2 | 3.5/5 | 4.0/5 | 3.0/5 | 2.5/5 |
| GLM-4.5 | 3.5/5 | 4.0/5 | 3.0/5 | 2.5/5 |
| (ref) GPT-4.1 | 4.5/5 | 4.5/5 | 4.5/5 | 4.5/5 |
| (ref) Claude Sonnet 4 | 4.5/5 | 5.0/5 | 4.5/5 | 4.5/5 |
This table is subjective and varies by use case. The pattern is clear.
- On coding, Chinese open models are nearly on par with global closed models. Qwen3-Coder and DeepSeek-Coder handle Korean and Japanese comments well.
- Everyday Korean and Japanese is a step behind. Japanese keigo in particular is a shared weakness.
- Fine-tuning closes much of the gap. A LoRA over Qwen 3 32B with Korean or Japanese instruction data pushes everyday quality near GPT-4o-mini.
Practical recommendations as of May 2026.
- Internal coding assistant: self-host Qwen3-Coder 30B-A3B. Clean Apache 2.0 license.
- Internal RAG chatbot: Qwen3 32B or GLM-4.5-Air. Fine-tune optional.
- Long PDF analysis: Kimi K2 (API) or self-host MiniMax-Text-01.
- Workflows that genuinely need reasoning: self-host DeepSeek-R1/R2 or Qwen3-235B thinking mode.
- Video generation: CogVideoX-Pro (Zhipu) self-host.
15. Outlook — Where Is Chinese AI Heading?
Finally, a 6 to 18 month scenario sketch.
Likely.
- Open-weight SOTA stays Chinese-led. While Meta delays Llama 4 and Mistral closes, DeepSeek, Qwen, and Kimi fill the gap. No reversal expected through the end of 2026.
- Export controls keep tightening. Although the 2024 US election is over, AI chip controls are bipartisan policy. Loosening is unlikely; the scope is expanding into weights and software.
- Domestic chips will exceed 50 percent share in inference. Huawei Ascend and Cambricon still lag NVIDIA on training, but the inference cost story pulls share fast.
- The price floor keeps falling. DeepSeek and Doubao established the 0.10 per 1M token range, and it is likely to dip further by late 2026.
Uncertain.
- How many of the Six Tigers survive. By end of 2026, plausibly only Moonshot, Zhipu, and MiniMax remain meaningful. Baichuan already retreated to healthcare. 01.AI pivoted to applications. StepFun has funding rumors.
- Does DeepSeek go consumer. Its hedge-fund parent removes the need for consumer ad spend, but if Doubao overtakes it on raw quality, it may have to enter the consumer fight.
- Global license disputes. As more companies fine-tune Apache 2.0 Qwen and rebrand it as their own model, at some point the base model owner may narrow the license.
One-line takeaway for Korean and Japanese engineers. "If you don't use a Chinese open model, the team next door will." Security concerns are real and require review, but weight-based self-hosting is data-safer than using OpenAI's API. As of May 2026, Chinese open models are essentially the only practical way to buy 80 to 90 percent of GPT-4o or Claude Sonnet 4 quality at one-fifth to one-tenth of the cost — for coding, RAG, and long-context workloads.
References
- DeepSeek official site: https://www.deepseek.com/
- DeepSeek GitHub: https://github.com/deepseek-ai
- DeepSeek-V3 Technical Report (arXiv): https://arxiv.org/abs/2412.19437
- DeepSeek-R1 Paper (arXiv): https://arxiv.org/abs/2501.12948
- DeepSeek HuggingFace: https://huggingface.co/deepseek-ai
- Qwen official site: https://qwen.ai/
- Qwen GitHub: https://github.com/QwenLM
- Qwen3 Technical Report (arXiv): https://arxiv.org/abs/2505.09388
- Qwen HuggingFace: https://huggingface.co/Qwen
- Moonshot AI: https://www.moonshot.cn/
- Kimi K2 Paper (arXiv): https://arxiv.org/abs/2507.20534
- Kimi HuggingFace: https://huggingface.co/moonshotai
- Zhipu AI: https://www.zhipuai.cn/
- GLM GitHub: https://github.com/THUDM
- ChatGLM HuggingFace: https://huggingface.co/THUDM
- CogVideoX: https://github.com/THUDM/CogVideo
- 01.AI official site: https://www.lingyiwanwu.com/
- Yi GitHub: https://github.com/01-ai
- Yi HuggingFace: https://huggingface.co/01-ai
- ByteDance Seed: https://team.doubao.com/en/research
- Doubao (Volcano Engine): https://www.volcengine.com/product/doubao
- Tencent Hunyuan: https://hunyuan.tencent.com/
- Hunyuan GitHub: https://github.com/Tencent/Hunyuan-Large
- Ant Group AI: https://www.antgroup.com/
- Ling-Plus announcement: https://www.antgroup.com/en/news-media/press-releases
- MiniMax: https://www.minimax.io/
- MiniMax-01 Paper (arXiv): https://arxiv.org/abs/2501.08313
- StepFun: https://www.stepfun.com/
- Baichuan: https://www.baichuan-ai.com/
- Huawei Ascend: https://www.hiascend.com/
- Cambricon: https://www.cambricon.com/
- ModelScope: https://www.modelscope.cn/
- HuggingFace: https://huggingface.co/
- vLLM: https://github.com/vllm-project/vllm
- LMDeploy: https://github.com/InternLM/lmdeploy
- SGLang: https://github.com/sgl-project/sglang
- Xinference: https://github.com/xorbitsai/inference
- C-Eval: https://cevalbenchmark.com/
- OpenCompass: https://opencompass.org.cn/
- SuperCLUE: https://www.superclueai.com/
- BIS Export Controls (US Commerce): https://www.bis.doc.gov/
- LMSys Chatbot Arena: https://chat.lmsys.org/