
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Sat, 16 May 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/blackwell/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-distributed-training-deepspeed-fsdp-megatron-ray-jax-fsdp2-torchtitan-blackwell-2026-deep-dive.en</guid>
    <title>Distributed Training &amp; GPU Infrastructure 2026 Deep-Dive — DeepSpeed, FSDP2, Megatron-LM, Ray Train, JAX, TorchTitan, Blackwell GB200, MI325X, TPU v5p</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-distributed-training-deepspeed-fsdp-megatron-ray-jax-fsdp2-torchtitan-blackwell-2026-deep-dive.en</link>
    <description>A comparison of DeepSpeed, FSDP2, Megatron-LM, Ray Train, JAX, TorchTitan, and Composer — plus NVIDIA Blackwell GB200 NVL72, AMD MI325X, Intel Gaudi 3, AWS Trainium 2, and Google TPU v5p/v6e Trillium. 3D parallelism, ZeRO/FSDP equivalence, MoE All-to-All, fp8/mxfp4 precision, NCCL tuning, checkpointing, and failure recovery — LLM training infra as of mid-2026.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>distributed-training</category><category>deepspeed</category><category>fsdp</category><category>megatron-lm</category><category>ray</category><category>jax</category><category>lightning</category><category>accelerate</category><category>torchtitan</category><category>cuda</category><category>nccl</category><category>blackwell</category><category>tpu</category><category>llm-training</category><category>gpu-infrastructure</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-distributed-training-deepspeed-fsdp-megatron-ray-jax-fsdp2-torchtitan-blackwell-2026-deep-dive.ja</guid>
    <title>分散学習 &amp; GPUインフラ 2026 ディープダイブ — DeepSpeed、FSDP2、Megatron-LM、Ray Train、JAX、TorchTitan、Blackwell GB200、MI325X、TPU v5p 総まとめ</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-distributed-training-deepspeed-fsdp-megatron-ray-jax-fsdp2-torchtitan-blackwell-2026-deep-dive.ja</link>
    <description>DeepSpeed/FSDP2/Megatron-LM/Ray Train/JAX/TorchTitan/Composerを比較し、NVIDIA Blackwell GB200 NVL72、AMD MI325X、Intel Gaudi 3、AWS Trainium 2、Google TPU v5p/v6e Trillimuまで。3D並列化、ZeRO-FSDP等価性、MoEのAll-to-All、fp8/mxfp4精度、NCCLチューニング、チェックポイント、障害復旧までLLM学習インフラ2026年現在形。</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>distributed-training</category><category>deepspeed</category><category>fsdp</category><category>megatron-lm</category><category>ray</category><category>jax</category><category>lightning</category><category>accelerate</category><category>torchtitan</category><category>cuda</category><category>nccl</category><category>blackwell</category><category>tpu</category><category>llm-training</category><category>gpu-infrastructure</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-distributed-training-deepspeed-fsdp-megatron-ray-jax-fsdp2-torchtitan-blackwell-2026-deep-dive</guid>
    <title>분산 학습 &amp; GPU 인프라 2026 딥다이브 — DeepSpeed, FSDP2, Megatron-LM, Ray Train, JAX, TorchTitan, Blackwell GB200, MI325X, TPU v5p 총정리</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-distributed-training-deepspeed-fsdp-megatron-ray-jax-fsdp2-torchtitan-blackwell-2026-deep-dive</link>
    <description>DeepSpeed/FSDP2/Megatron-LM/Ray Train/JAX/TorchTitan/Composer를 비교하고, NVIDIA Blackwell GB200 NVL72, AMD MI325X, Intel Gaudi 3, AWS Trainium 2, Google TPU v5p/v6e Trillium까지. 3D 병렬화, ZeRO-FSDP 등가성, MoE All-to-All, fp8/mxfp4 정밀도, NCCL 튜닝, 체크포인팅, 실패 복구까지 LLM 학습 인프라 2026년 현재형.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>distributed-training</category><category>deepspeed</category><category>fsdp</category><category>megatron-lm</category><category>ray</category><category>jax</category><category>lightning</category><category>accelerate</category><category>torchtitan</category><category>cuda</category><category>nccl</category><category>blackwell</category><category>tpu</category><category>llm-training</category><category>gpu-infrastructure</category>
  </item>

    </channel>
  </rss>
