Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

1. The 2026 AI Hardware Map — Four Camps: Hyperscaler / Challenger / In-House / Edge

In May 2026 the AI chip market looks nothing like it did five years ago. The single-vendor era of NVIDIA — V100 in 2020, A100 in 2021, H100 in 2022, H200 in 2023 — opened a new chapter when Blackwell debuted at GTC 2024. And in 2026, **there are more chips than ever, and the choice has only gotten harder.**

Roughly four camps now exist.

- **Hyperscaler GPU** — NVIDIA Blackwell (B100 / B200 / GB200 / B300, Rubin coming September 2026), AMD Instinct (MI300X to MI355X to MI400 Helios), Intel Gaudi 3 (with Falcon Shores rumors)

- **Challenger / Specialty** — Cerebras WSE-3 (wafer scale), Groq LPU (sequential inference), SambaNova SN40L (Reconfigurable Dataflow), Tenstorrent (Jim Keller, RISC-V open), Etched Sohu (transformer-only ASIC), MatX, Tachyum Prodigy

- **In-house Cloud** — Google TPU v5p / Trillium (v5e / v6), AWS Trainium 2 + Inferentia 3, Meta MTIA, Microsoft Maia, Apple AC1 (rumored)

- **Edge / Phone NPU** — Apple A18 Pro Neural Engine, Snapdragon 8 Gen 4 Hexagon NPU, MediaTek Dimensity 9400 APU, Google Tensor G5 mobile TPU

On pricing: an H100 was $30K-40K in 2024; B200 lands at $30K-40K as well, and a GB200 NVL72 rack runs about $3M. Cloud rentals settled at $2-4 per hour for H100 and now $4-8 per hour for partial B200 instances.

This piece walks through each camp — specs, architecture, memory and interconnect, and finally the Korean and Japanese players.

> All numbers are sourced from public material as of May 2026, plus reporting from SemiAnalysis, The Information and Reuters. Private cluster prices are estimates.

2. NVIDIA Blackwell — B100 / B200 / GB200 NVL72 / B300 / Rubin

Family structure

Blackwell is NVIDIA's fifth generation data-center GPU architecture, unveiled by Jensen Huang at GTC March 2024. It is the successor to Hopper (H100/H200), fabbed on TSMC N4P, and **for the first time uses a chiplet structure — two GPU dies connected by NV-HBI (NVIDIA High-Bandwidth Interconnect) at 10 TB/s**.

- **B100** — 700W TDP, air-coolable, 192GB HBM3E, 14 PFLOPS FP8

- **B200** — 1000W, liquid cooling recommended, 192GB HBM3E, 18 PFLOPS FP8 / 36 PFLOPS FP4

- **GB200** — superchip bundling 1 Grace CPU + 2 B200 GPUs over NVLink-C2C at 900 GB/s

- **GB200 NVL72** — single rack with 36 GB200 modules linked by NVLink 5 in a 72-GPU all-to-all

- **B300 (Blackwell Ultra)** — late 2025, 288GB HBM3E, stronger FP4 inference

Why NVL72 matters

Seventy-two B200 GPUs share one NVLink domain. A model sees 72 GPUs as if they were one. **MoE token routing — the all-to-all step — happens inside NVLink, never on InfiniBand.** That removes the real bottleneck behind GPT-4 / Claude 3.5 class training.

Rubin — September 2026

Rubin is NVIDIA's sixth generation. Pre-announced at GTC 2024, formal launch is expected at GTC September 2026.

- **R100** — TSMC N3, HBM4 (288GB or more)

- **Vera Rubin** — Vera CPU (Grace successor) bundled with Rubin GPU

- **NVL144** — the all-to-all domain doubles to 144 GPUs

NVIDIA's annual cadence holds: 2024 Blackwell, 2025 Blackwell Ultra, 2026 Rubin, 2027 Rubin Ultra.

Pricing and supply

An H100 was $30K-40K per card in 2024. B200 trades at $30K-40K per card and $4-8 per hour in cloud partials. A GB200 NVL72 rack runs about $3M. In 1H 2025 NVIDIA shipped more than two million Blackwell GPUs per quarter (Reuters).

3. AMD Instinct — MI300X to MI325X to MI355X to MI400 Helios

MI300X (December 2023)

CDNA 3 architecture, 192GB HBM3, 5.2 PFLOPS FP8. With 2.4x H100's memory (80GB), it landed at Meta and Microsoft for Llama inference fleets. About $15K-20K per card.

MI325X (Q4 2024)

Memory bumped to 256GB HBM3E, clocks raised slightly. The H200 counter.

MI355X (late 2025)

CDNA 4 architecture. 288GB HBM3E, native FP4 datatype. The direct response to Blackwell B200/B300. ROCm 6.x has matured to where PyTorch / vLLM / SGLang feel almost as smooth as on NVIDIA.

MI400 Helios (2026)

The next-generation platform AMD unveiled at Advancing AI 2025.

- **MI400 Instinct GPU** — CDNA Next, HBM4 memory

- **Helios rack-scale system** — 72 GPUs in a single ScaleUP domain (NVL72 counterpart)

- **Pensando DPU** + ROCm 7 + UALink interconnect

UALink is the open alternative to NVLink. AMD / Broadcom / Cisco / Google / HPE / Intel / Meta / Microsoft formed the consortium, and the 1.0 spec was published in 1H 2026.

Market share

In 2025 data-center GPU revenue, NVIDIA held 90%+, AMD 5-7%, with Intel and in-house chips splitting the remainder. AMD locked in Microsoft Azure ND-MI355X-v6 and a Meta cluster on MI355X, and Oracle Cloud announced the first big MI400 Helios deployment.

4. Intel Gaudi 3 + Falcon Shores Rumors

Gaudi 3 — the last standalone line

Intel acquired Habana Labs for about $2B in 2019 and has shipped Gaudi 1 / 2 / 3 since. Gaudi 3 launched April 2024 on TSMC N5, with 128GB HBM2E and 8x 200 Gbps Ethernet (RoCE instead of InfiniBand) interconnect.

- BF16 1835 TFLOPS

- FP8 1835 TFLOPS

- $7K-15K per card (less than half of NVIDIA)

- Weakness — the SynapseAI software stack. PyTorch works but the ecosystem still trails CUDA and ROCm

Stability AI, Naver and Intel's own Tiber Cloud are notable adopters.

Falcon Shores rumors

Falcon Shores was originally the planned Gaudi-plus-Ponte-Vecchio fusion shipping in 2024. **In September 2024 Intel officially canceled the external launch**, keeping it as an internal R&D vehicle.

As of May 2026, the rumor is that Intel is preparing either a Gaudi 4 or a brand-new GPU line for 1H 2027. The seed of that story is a comment from Lip-Bu Tan (the new CEO since 2025) at an IFS Cup event that Intel will "reorganize the AI chip line."

5. Apple M5 + M5 Pro + Neural Engine + AC1 Server Chip

M5 / M5 Pro / M5 Max — October 2025

Apple Silicon fifth generation, on TSMC N3E. CPU core counts are unchanged. **The GPU gains ray-tracing accelerators and a new matrix engine aimed at AI inference**.

- **M5** — 10-core CPU, 10-core GPU, 16-core Neural Engine, 38 TOPS

- **M5 Pro** — 14-core CPU, 20-core GPU, 16-core NE

- **M5 Max** — 16-core CPU, 40-core GPU, 16-core NE

The Neural Engine stays at 16 cores. The change is matrix-multiply throughput and INT4 quantization acceleration.

AC1 server chip — spring 2026 rumor

Sourced from The Information (November 2025) and Bloomberg's Mark Gurman. Apple is building a server-class SoC for data-center AI inference.

- **Apple Compute 1 (AC1)** — codename, Mac Pro server form factor

- Slated for part of the Apple Intelligence backend in spring 2026

- The successor to M2 Ultra Mac Pros (possibly based on M5 Ultra)

Apple already runs Apple Intelligence Private Cloud Compute (PCC) on M2 Ultra Macs. AC1 is the next-generation PCC silicon.

6. Google TPU v5p + Trillium (v5/v6)

TPU lineage

- **TPU v1** (2015) — inference-only, INT8

- **TPU v2** (2017) — training plus inference, BF16

- **TPU v3** (2018) — first liquid cooling

- **TPU v4** (2021) — Optical Circuit Switching

- **TPU v5e** (2023) — inference cost-optimized

- **TPU v5p** (2023) — training flagship, used for Gemini

- **TPU v6 Trillium** (2024) — 4.7x performance over v5e

What Trillium does

Announced at Google I/O May 2024. **The workhorse for Gemini 2.0 training**.

- 2x HBM capacity (32GB to 64GB)

- 2x interconnect

- 67% better energy efficiency

Trillium scales by pod: 256 chips per pod, with optical ICI (Inter-Chip Interconnect) reaching 8,960 chips in a SuperPod.

TPU gen 7 — late 2026 rumor

The Information reports a TPU v7 reveal slated for late 2026. With Anthropic relying heavily on TPUs, the stakes are large.

7. Cerebras WSE-3 — 4 Trillion Transistors, Wafer Scale

The wafer-scale idea

A standard chip is cut from a 12-inch wafer into reticle-sized dies (about 858 mm²). Cerebras **uses the entire wafer as one chip**.

WSE-3 (announced March 2024):

- 46,225 mm² area

- **4 trillion transistors** (about 18x Blackwell)

- **900,000 cores** (custom RISC-V style)

- **44GB on-chip SRAM** (no HBM, only on-chip SRAM)

- **125 PFLOPS FP8**

- TSMC 5nm

Why wafer scale

Eliminate chip-to-chip communication. Memory (SRAM) sits directly next to compute, giving bandwidth orders of magnitude beyond HBM. **All model weights live in on-wafer SRAM — a 70B model fits on one wafer**.

Tradeoffs

- **Strength** — inference latency is unmatched. On Llama 3.1 70B, per-token latency rivals or beats Groq

- **Weakness** — training cost-per-flop trails NVIDIA. Yield and packaging are expensive

- **Customers** — G42 (UAE), Mayo Clinic, Argonne National Lab and other specialized domains

A CS-3 system is estimated at $2-3M.

8. Groq LPU — Sequential Inference Speed

The LPU idea

Groq's LPU (Language Processing Unit) is the chip from a company founded by Jonathan Ross, a former Google TPU engineer, in 2016. **Deterministic execution** — every instruction runs on a cycle precisely scheduled by the compiler.

- 14nm GlobalFoundries

- 230MB on-chip SRAM (no HBM)

- 750 TOPS INT8

- Tensor Streaming Processor (TSP) architecture

Why it's fast

GPUs use dynamic scheduling to spread work across SMs. The LPU resolves all dispatch at compile time — no runtime branching. The result: **Llama 70B inference at 200-300 tokens per second**, four to eight times faster than the 30-50 tokens per second on an H100.

Tradeoffs

- No training — inference only

- Many LPUs needed per model (small SRAM means weights must be sharded)

- Per-data-center cost can exceed NVIDIA's

That said, the LPU shines at latency-first workloads: code completion, chatbots, voice assistants. Groq Cloud serves Llama 70B starting at $0.59 per hour.

9. SambaNova SN40L — Reconfigurable Dataflow

SambaNova's bet

Founded in 2017 by Stanford professor Kunle Olukotun and Rodrigo Liang. The **Reconfigurable Dataflow Architecture (RDA)** reshapes the on-chip data flow per workload.

SN40L (2023):

- TSMC 5nm

- 1.5TB DDR5 + 64GB HBM3

- 638 BF16 TFLOPS

- 3-tier on-chip memory (SRAM / HBM / DDR)

Why RDA

GPUs are SIMT machines tuned for dense tensor math. Transformers add irregular patterns — dynamic-shape attention, sparse MoE dispatch. RDA **compiles a custom data path per layer**, making it strong on sparse workloads.

Customers

U.S. DOE labs (Lawrence Livermore, Argonne), Saudi Aramco, parts of the SoftBank R&D cluster.

10. Tenstorrent — Jim Keller, RISC-V Open Architecture

Jim Keller's company

Former AMD Zen lead architect, former Apple A4/A5 lead, former Tesla Autopilot chip lead, former Intel SVP. He joined Tenstorrent as CEO in 2021.

The core differentiators:

- **RISC-V control plane** — every chip's control logic is RISC-V

- **Open architecture** — RTL and parts of the compiler are public

- **Tensix cores** — matrix-multiply, vector, and data movement fused

- **Scalable mesh interconnect** — over plain Ethernet

Lineup

- **Grayskull** (2020) — first generation, eval-only

- **Wormhole** (2023) — data center plus 12x 100G Ethernet

- **Blackhole** (2024) — first packaged production chip, 16 CPUs and 32GB GDDR6

- **Hub / Galaxy** — 32 Wormhole chips in a 4U box at $50K

Hyundai / Samsung / LG AI Research investment

A 2024 Korean consortium (Hyundai Motor, Samsung NEXT, LG) invested in Tenstorrent. Korea sees automotive AI and data-center AI as the targets.

11. Etched Sohu — Transformer-Only ASIC (June 2024)

One thing, done well

Etched was founded in 2022 by two Harvard undergrads. In June 2024 the Sohu unveil made waves.

- **Transformer-only architecture** — no CNN, no RNN, no MLP-only fallback

- 144GB HBM3E

- TSMC 4nm

- Advertised — **20x H100 tokens-per-second on Llama 70B**

Why transformer-only

Less than 30% of a GPU's area actually services transformer inference. Since attention and FFN patterns are so well known, **strip the other 70% of silicon and pack more attention units in its place**.

Risk and reward

The risk is obvious. If Mamba / RWKV / SSM / diffusion architectures rise, Sohu becomes useless overnight. As of May 2026, transformers still account for 80%+ of LLMs — Etched is betting hard on that.

Series A in 2024 was $120M, with Peter Thiel and Stanley Druckenmiller among the investors.

12. AWS Trainium 2 + Inferentia 3

The AWS in-house silicon strategy

AWS shipped Inferentia 1 in 2018, Trainium 1 in 2020, Inferentia 2 in 2023, Trainium 2 in 2024, and Inferentia 3 in 2025.

- **Trainium 1** (2020) — first training chip

- **Inferentia 2** (2023) — Stable Diffusion / Llama inference

- **Trainium 2** (2024) — the main silicon behind Anthropic's Project Rainier

- **Inferentia 3** (2025) — Llama 405B inference is the carrier workload

A Trn2.48xlarge instance has 16 chips and 1.5TB HBM at roughly $5-6 per hour.

Anthropic's Project Rainier

Anthropic announced Project Rainier in 2024 — a massive Trainium 2 cluster. Reports put it at **400,000 Trainium 2 chips**, used to train Claude 4.x (official statement).

AWS will launch Trainium 3 in late 2026. The Neuron SDK now feels native in PyTorch and JAX.

13. MatX / Tachyum Prodigy — Newer Entrants

MatX

Founded 2022 by former Google TPU and OpenAI engineers. The mission: **a chip purpose-built for LLM training**. Series B raised $80M in 2025; the first silicon targets late 2026.

Tachyum Prodigy

Founded by Slovak-origin Radoslav Danilak. The ambition: **AI plus HPC plus general compute on a single chip**.

- 192-core CPU plus AI tensor units

- 96GB HBM3 plus DDR5

- TSMC 5nm

- Tape-out completed Q1 2026, samples shipping

The skeptics are loud but EuroHPC (the EU public HPC program) is positioned as the first big customer.

14. Phone NPUs — A18 Pro / Snapdragon 8 Gen 4 / Dimensity 9400 / Tensor G5

Apple A18 Pro (September 2024, iPhone 16 Pro)

- 6-core CPU + 6-core GPU + 16-core Neural Engine

- 35 TOPS Neural Engine

- Drives on-device Apple Intelligence inference

Snapdragon 8 Gen 4 (October 2024, Samsung S25 and others)

- Qualcomm's own Oryon CPU plus Adreno GPU and Hexagon NPU

- 45 TOPS (Hexagon)

- TSMC 4nm

MediaTek Dimensity 9400 (October 2024)

- TSMC 3nm with Arm Cortex-X925

- 50 TOPS on APU 890

- Generative AI workloads (SD / Llama) heavily emphasized

Google Tensor G5 (October 2024, Pixel 9)

- Moves to TSMC 3nm, leaving Samsung Foundry (a major shift)

- Fifth-generation mobile TPU (Edge TPU successor)

- Custom ML acceleration plus on-device Gemini Nano

The point of phone NPUs is simple: **on-device inference is effectively zero cost**. No cloud call — LLM responses are generated locally.

15. Interconnect — NVLink 5/6 / PCIe Gen 6/7 / CXL

NVLink 5

- NVLink 5 starts with Blackwell

- 1.8 TB/s per chip (1.4 TB/s GPU-to-GPU, bidirectional)

- NVL72 — 72-GPU all-to-all

NVLink 6 lands with Rubin (2026), at an estimated 3.6 TB/s per chip.

PCIe Gen 6 / Gen 7

- **PCIe 6.0** — spec ratified 2022, 64 GT/s, first volume in late-2024 server boards

- **PCIe 7.0** — spec ratified 2025, 128 GT/s, volume in 2027 to 2028

The Gen 6 shift introduces PAM4 signaling — solving SerDes limits with four levels instead of two.

CXL

Compute Express Link. The Intel-led standard for memory sharing across CPU, GPU, DPU and memory pools over PCIe.

- **CXL 1.x** — memory attach

- **CXL 2.x** — memory pooling

- **CXL 3.x** — memory sharing (cache coherent)

As of May 2026, CXL 3.0 volume parts (Samsung CMM-D, Micron CZ120) are deployed. NVMe plus CXL memory expansion is the new paradigm for Tier 1 / Tier 2 / Tier 3 memory hierarchies.

UALink (Ultra Accelerator Link)

The open alternative to NVLink. AMD / Broadcom / Cisco / Google / HPE / Intel / Meta / Microsoft consortium. The 1.0 spec was published in 2026.

16. Memory — HBM3E / HBM4 / Samsung + SK Hynix

HBM evolution

- **HBM1** (2015) — 4-Hi, 1 GBps per pin

- **HBM2** (2016) — 8-Hi, 2 GBps

- **HBM2E** (2018) — 3.6 GBps

- **HBM3** (2022) — 6.4 GBps, 24GB per stack

- **HBM3E** (2024) — 9.6 GBps, 36GB per stack (B200, MI355X)

- **HBM4** (2026) — 16 Gbps and up, 48GB per stack expected

HBM sits next to the GPU die in a 2.5D or 3D stack. With HBM3E and an 8-stack design, bandwidth crosses 8 TB/s.

Supply — SK Hynix / Samsung / Micron

- **SK Hynix** — first HBM3E volume, NVIDIA's primary supplier. HBM accounted for 30%+ of revenue in 2025

- **Samsung** — slow to HBM3 but leading the HBM4 standard. Passed NVIDIA HBM3E 12-Hi qualification in 2025

- **Micron** — third place, HBM3E volume in 2024

One Blackwell card carries eight HBM3E stacks for 192GB total. A stack is roughly $250-300, so HBM alone costs about $2-2.4K per chip.

HBM4

JEDEC ratified the standard in April 2025. 16 Gbps per pin, 12-Hi and 16-Hi stacks. Volume debut is in Rubin (2026). Both Korean vendors are racing for NVIDIA's qualification.

17. Korea — FuriosaAI + Rebellions (Sapeon Merger 2024)

FuriosaAI

Founded in 2017 by June Paik, a Samsung and AMD veteran. **RNGD (Renegade)** launched in 2024, targeting Llama inference workloads.

- TSMC 5nm

- 256GB HBM3

- 512 TFLOPS FP8

- 64 TFLOPS BF16

- 150W TDP

LG AI Research adopted RNGD for EXAONE inference; a collaboration with Kakao Enterprise Cloud was also announced.

Rebellions + Sapeon merger

- **Rebellions** (founded 2020) — KT is the lead investor. ATOM — inference chip

- **Sapeon** (spun out of SK Telecom) — X220 and X330 inference chips

In July 2024 Rebellions and Sapeon announced a merger. The post-merger entity is named **Rebellions**. KT, SK Telecom and Samsung all stayed on as investors. The next-gen **REBEL** chip was unveiled in 2025 and enters volume in 2026.

- 5nm Samsung Foundry

- 144GB HBM3E

- 250W TDP

- Both training and inference

Korea's K-Cloud initiative aims to deploy domestic AI accelerators in 50% of NIA data centers by 2030.

18. Japan — SoftBank Graphcore + Preferred Networks MN-3 + Rapidus 2nm 2027

SoftBank's Graphcore acquisition (July 2024)

SoftBank acquired UK-based Graphcore for about $500M. Graphcore's IPU (Intelligence Processing Unit) family — Bow IPU and second-generation Colossus — will be folded into SoftBank's Cristal Intelligence (in-house AI infrastructure).

Preferred Networks MN-3 / MN-Core 2

Preferred Networks is Japan's flagship AI company. The MN-Core line is its proprietary training accelerator.

- **MN-3** (2020) — Green500 number one in energy efficiency

- **MN-Core 2** (2024) — 7nm, 130 TFLOPS BF16

Used internally to train PFN's own LLMs and shared with select partners (notably Toyota) rather than sold externally.

Rapidus — 2nm in 2027

A new foundry funded by the Japanese government, Sony, Toyota, NTT and SoftBank. **The goal is 2nm volume in 2027**. The technology partnership is with IBM; a fab is under construction in Chitose, Hokkaido.

The U.S., Korea and Taiwan (TSMC) dominate the leading-edge foundry world — Japan is making another run. A pilot line is running as of May 2026; if the 2027 schedule holds, Rapidus becomes the biggest Japanese AI-chip variable of the decade.

19. Liquid Cooling + Data-Center Power

Why liquid cooling

H100 was 700W, B200 is 1000W, a GB200 NVL72 rack is 120 kW. **Air cooling can't handle it**. Eight 1000W GPUs in a 1U server is 8 kW per server. About 30 kW per rack is the air-cooling ceiling — above that liquid cooling is mandatory.

Forms of liquid cooling

- **Direct-to-chip (D2C)** — cold plates clamped to the silicon, fluid circulates

- **Rear-door heat exchanger** — radiator on the back of the rack

- **Immersion cooling** — submerge the whole server in dielectric fluid

GB200 NVL72 standardizes on D2C. A facility-wide water loop is mandatory. PUE drops to **about 1.05** (vs 1.4 to 1.6 for air).

Power — data centers next to power plants

The new generation of Anthropic, OpenAI and Meta data centers is **2 GW and up**. That's the consumption of about 2 million U.S. homes.

- Microsoft restarting Three Mile Island (September 2024, with Constellation Energy)

- Amazon and Cumulus Data colocating next to a nuclear plant

- Google contracting with Kairos Power for SMRs (small modular reactors)

As of May 2026, AI sites are spreading across U.S. PJM, Texas ERCOT, Taiwan's Hsinchu, Korea's Anseong and Pyeongtaek, and the wider Tokyo-area grid — and the generation infrastructure has become the real bottleneck.

20. Who Should Pick What — Training / Inference / Edge / Phone

Training — big models, new models

| Situation | Recommendation |

| --- | --- |

| Cutting-edge 70B+ MoE training | NVIDIA GB200 NVL72 / Rubin (late 2026) |

| Cost-optimized training (50%+ cheaper) | AMD MI355X / MI400 Helios |

| TPU-friendly (JAX / TF) | Google TPU v5p / Trillium |

| OK with AWS lock-in | AWS Trainium 2 |

Inference — high volume

| Situation | Recommendation |

| --- | --- |

| General LLM serving | NVIDIA H200 / B200 / AMD MI300X |

| Ultra-low latency (code completion) | Groq LPU / Cerebras WSE-3 |

| Transformer-only | Etched Sohu (post-launch) |

| Korea / EXAONE / domestic models | FuriosaAI RNGD / Rebellions REBEL |

Edge — robots, vehicles, IoT

| Situation | Recommendation |

| --- | --- |

| Autonomous driving | NVIDIA Drive Thor / Tesla FSD HW5 |

| Industrial IoT | NVIDIA Jetson Orin / Hailo-10 / Tenstorrent |

| Desktop workstation | NVIDIA RTX 5090 / AMD Radeon Pro |

Phone — on-device LLM

| Situation | Recommendation |

| --- | --- |

| iOS Apple Intelligence | A18 Pro Neural Engine |

| Android Gemini Nano | Snapdragon 8 Gen 4 / Tensor G5 |

| Budget Android | Dimensity 9400 |

The decision criteria are simple — **software-stack compatibility, unit cost, and availability**. NVIDIA's CUDA ecosystem is still the strongest moat, but ROCm, XLA, Neuron and SynapseAI are closing in.

21. References

- NVIDIA — Blackwell architecture: https://www.nvidia.com/en-us/data-center/blackwell-architecture/

- NVIDIA — GTC 2024 keynote: https://www.nvidia.com/gtc/keynote/

- AMD — Instinct MI300X: https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html

- AMD — Advancing AI 2024: https://www.amd.com/en/corporate/events/advancing-ai.html

- Intel — Gaudi 3: https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi3.html

- Apple — Apple Intelligence: https://www.apple.com/apple-intelligence/

- Google Cloud — TPU v5p: https://cloud.google.com/tpu/docs/v5p

- Google Cloud — Trillium TPU: https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus

- Cerebras — WSE-3: https://www.cerebras.ai/product-chip

- Groq — LPU: https://groq.com/

- SambaNova — SN40L: https://sambanova.ai/products

- Tenstorrent — Wormhole / Blackhole: https://tenstorrent.com/cards/

- Etched — Sohu: https://www.etched.com/announcing-etched

- AWS — Trainium 2: https://aws.amazon.com/machine-learning/trainium/

- AWS — Inferentia: https://aws.amazon.com/machine-learning/inferentia/

- MatX: https://matx.com/

- Tachyum — Prodigy: https://www.tachyum.com/products/

- Qualcomm — Snapdragon 8 Gen 4: https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-elite-mobile-platform

- MediaTek — Dimensity 9400: https://www.mediatek.com/products/smartphones-2/mediatek-dimensity-9400

- SemiAnalysis — Blackwell deep dive: https://www.semianalysis.com/

- SK Hynix — HBM: https://www.skhynix.com/hbm/

- Samsung Semiconductor — HBM: https://semiconductor.samsung.com/dram/hbm/

- Micron — HBM3E: https://www.micron.com/products/memory/hbm

- JEDEC — HBM4 standard: https://www.jedec.org/

- CXL Consortium: https://www.computeexpresslink.org/

- UALink Consortium: https://ualinkconsortium.org/

- FuriosaAI: https://www.furiosa.ai/

- Rebellions: https://rebellions.ai/

- Preferred Networks — MN-Core: https://projects.preferred.jp/mn-core/en/

- Rapidus: https://www.rapidus.inc/en/

- Reuters — NVIDIA shipments: https://www.reuters.com/

- The Information — AI hardware coverage: https://www.theinformation.com/

- Anthropic — Trainium / Project Rainier: https://www.anthropic.com/news