Chaos and Order

Chaos and Order https://www.youngju.dev/blog 천천히 올바르게. AI Researcher & DevOps Engineer Youngju's tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering. ko fjvbn2003@gmail.com (Youngju Kim) fjvbn2003@gmail.com (Youngju Kim) Tue, 16 Jun 2026 00:00:00 GMT https://www.youngju.dev/blog/gpu-cuda/2026-06-16-cerebras-wafer-scale-deep-dive.en Cerebras Wafer-Scale Deep Dive — A Whole Model on a Single Chip https://www.youngju.dev/blog/gpu-cuda/2026-06-16-cerebras-wafer-scale-deep-dive.en A close look at the design of the Cerebras WSE-3, a single chip carved from an entire wafer. We cover the on-chip SRAM-centric structure that routes around the memory wall, the fault-tolerant design, real-time inference performance, and the trade-offs versus GPU clusters. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) cerebraswafer-scaleai-hardwarememory-wallinferencewse-3 https://www.youngju.dev/blog/gpu-cuda/2026-06-16-cerebras-wafer-scale-deep-dive.ja Cerebras ウェハースケールディープダイブ — 1枚のチップにモデル全体を https://www.youngju.dev/blog/gpu-cuda/2026-06-16-cerebras-wafer-scale-deep-dive.ja ウェハー1枚を丸ごと1つのチップにした Cerebras WSE-3 の設計を深く掘り下げます。メモリウォールを回避するオンチップSRAM中心の構造、欠陥許容設計、リアルタイム推論性能、そしてGPUクラスタとのトレードオフを整理します。 Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) cerebraswafer-scaleai-hardwarememory-wallinferencewse-3 https://www.youngju.dev/blog/gpu-cuda/2026-06-16-cerebras-wafer-scale-deep-dive Cerebras 웨이퍼스케일 딥다이브 — 칩 하나에 모델 전체를 https://www.youngju.dev/blog/gpu-cuda/2026-06-16-cerebras-wafer-scale-deep-dive 웨이퍼 한 장을 통째로 하나의 칩으로 만든 Cerebras WSE-3의 설계를 깊게 파헤칩니다. 메모리 월을 우회하는 온칩 SRAM 중심 구조, 결함 허용 설계, 실시간 추론 성능, 그리고 GPU 클러스터 대비 장단점을 정리합니다. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) cerebraswafer-scaleai-hardwarememory-wallinferencewse-3 https://www.youngju.dev/blog/gpu-cuda/2026-06-16-in-memory-computing-principles.en In-Memory Computing Principles — Computing Inside the Memory https://www.youngju.dev/blog/gpu-cuda/2026-06-16-in-memory-computing-principles.en A deep look at the principles of compute-in-memory (CIM): computing directly inside memory instead of moving data to a compute unit. We cover solving a matrix multiply in one shot with a crossbar array, the difference between analog and digital approaches, the precision-versus-noise trade-off, and 2026 research trends and commercialization challenges. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) in-memory-computingcompute-in-memoryai-hardwarecrossbarrerammemory-wall https://www.youngju.dev/blog/gpu-cuda/2026-06-16-in-memory-computing-principles.ja インメモリコンピューティングの原理 — メモリの中で演算する https://www.youngju.dev/blog/gpu-cuda/2026-06-16-in-memory-computing-principles.ja データを演算ユニットへ運ぶ代わりに、メモリの中で直接演算する compute-in-memory(CIM)の原理を深く整理します。クロスバーアレイで行列積を一度に解く方法、アナログとデジタル方式の違い、精度とノイズのトレードオフ、そして2026年の研究動向と商用化の課題を扱います。 Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) in-memory-computingcompute-in-memoryai-hardwarecrossbarrerammemory-wall https://www.youngju.dev/blog/gpu-cuda/2026-06-16-in-memory-computing-principles 인메모리 컴퓨팅 원리 — 메모리에서 연산하기 https://www.youngju.dev/blog/gpu-cuda/2026-06-16-in-memory-computing-principles 데이터를 연산 유닛으로 옮기는 대신, 메모리 안에서 직접 연산하는 compute-in-memory(CIM)의 원리를 깊게 정리합니다. 크로스바 어레이로 행렬곱을 한 번에 푸는 방법, 아날로그와 디지털 방식의 차이, 정밀도와 노이즈의 트레이드오프, 그리고 2026년 연구 동향과 상용화 과제를 다룹니다. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) in-memory-computingcompute-in-memoryai-hardwarecrossbarrerammemory-wall https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.en The Memory Wall and HBM — The Real Bottleneck That Divides AI Performance https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.en In an era where compute is cheap and data movement is expensive, the real bottleneck of AI performance is memory. From the memory-wall concept to HBM generations, the roofline model and arithmetic intensity, the KV cache, and how quantization saves bandwidth, all from a developer view. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) memory-wallhbmbandwidthrooflineinferenceai-hardwarequantization https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.ja メモリウォールとHBM — AI性能を分ける本当のボトルネック https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.ja 演算が安くなりデータ移動が高くなった時代、AI性能の本当のボトルネックはメモリです。メモリウォールの概念からHBM世代、rooflineモデルと算術強度、KVキャッシュ、量子化による帯域幅削減まで開発者目線で整理します。 Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) memory-wallhbmbandwidthrooflineinferenceai-hardwarequantization https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth 메모리 월과 HBM — AI 성능을 가르는 진짜 병목 https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth 연산은 싸지고 데이터 이동은 비싸진 시대, AI 성능의 진짜 병목은 메모리입니다. 메모리 월 개념부터 HBM 세대, roofline 모델과 산술 강도, KV 캐시, 양자화로 대역폭을 절감하는 법까지 개발자 관점에서 정리합니다. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) memory-wallhbmbandwidthrooflineinferenceai-hardwarequantization