Chaos and Order

Chaos and Order https://www.youngju.dev/blog 천천히 올바르게. AI Researcher & DevOps Engineer Youngju's tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering. ko fjvbn2003@gmail.com (Youngju Kim) fjvbn2003@gmail.com (Youngju Kim) Tue, 16 Jun 2026 00:00:00 GMT https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.en The Memory Wall and HBM — The Real Bottleneck That Divides AI Performance https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.en In an era where compute is cheap and data movement is expensive, the real bottleneck of AI performance is memory. From the memory-wall concept to HBM generations, the roofline model and arithmetic intensity, the KV cache, and how quantization saves bandwidth, all from a developer view. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) memory-wallhbmbandwidthrooflineinferenceai-hardwarequantization https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.ja メモリウォールとHBM — AI性能を分ける本当のボトルネック https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth.ja 演算が安くなりデータ移動が高くなった時代、AI性能の本当のボトルネックはメモリです。メモリウォールの概念からHBM世代、rooflineモデルと算術強度、KVキャッシュ、量子化による帯域幅削減まで開発者目線で整理します。 Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) memory-wallhbmbandwidthrooflineinferenceai-hardwarequantization https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth 메모리 월과 HBM — AI 성능을 가르는 진짜 병목 https://www.youngju.dev/blog/gpu-cuda/2026-06-16-memory-wall-hbm-bandwidth 연산은 싸지고 데이터 이동은 비싸진 시대, AI 성능의 진짜 병목은 메모리입니다. 메모리 월 개념부터 HBM 세대, roofline 모델과 산술 강도, KV 캐시, 양자화로 대역폭을 절감하는 법까지 개발자 관점에서 정리합니다. Tue, 16 Jun 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) memory-wallhbmbandwidthrooflineinferenceai-hardwarequantization