
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Sat, 16 May 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/aphrodite/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-ai-inference-engines-2026-vllm-sglang-llama-cpp-tgi-tensorrt-llm-mlx-mistralrs-deepspeed-aphrodite-deep-dive.en</guid>
    <title>AI Inference Engines 2026 - vLLM · SGLang · llama.cpp · TGI · TensorRT-LLM · MLX · mistral.rs · DeepSpeed-MII · Aphrodite Deep Dive</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-ai-inference-engines-2026-vllm-sglang-llama-cpp-tgi-tensorrt-llm-mlx-mistralrs-deepspeed-aphrodite-deep-dive.en</link>
    <description>In 2026, LLM engineering is no longer about which model — it is about which inference engine. We dissect vLLM V1, SGLang 0.4, TensorRT-LLM, TGI 3.x, llama.cpp, MLX-LM, mistral.rs, DeepSpeed-MII, Aphrodite, CTranslate2, ExLlamaV3, OpenVINO, AWS Neuron, Triton — 10+ engines through the lens of PagedAttention, Continuous Batching, Speculative Decoding, Disaggregated Inference, KV quantization, NIM, and Groq LPU. Plus self-hosting ROI math and Korean/Japanese inference infrastructure.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm-inference</category><category>vllm</category><category>sglang</category><category>llama-cpp</category><category>tgi</category><category>tensorrt-llm</category><category>mlx</category><category>mistral-rs</category><category>deepspeed</category><category>aphrodite</category><category>inference</category><category>english</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-ai-inference-engines-2026-vllm-sglang-llama-cpp-tgi-tensorrt-llm-mlx-mistralrs-deepspeed-aphrodite-deep-dive.ja</guid>
    <title>AI 推論エンジン 2026 完全ガイド - vLLM · SGLang · llama.cpp · TGI · TensorRT-LLM · MLX · mistral.rs · DeepSpeed-MII · Aphrodite 徹底解剖</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-ai-inference-engines-2026-vllm-sglang-llama-cpp-tgi-tensorrt-llm-mlx-mistralrs-deepspeed-aphrodite-deep-dive.ja</link>
    <description>2026 年の LLM エンジニアリングはもうモデル選定の問題ではなく、エンジン選定の問題になった。vLLM V1、SGLang 0.4、TensorRT-LLM、TGI 3.x、llama.cpp、MLX-LM、mistral.rs、DeepSpeed-MII、Aphrodite、CTranslate2、ExLlamaV3、OpenVINO、AWS Neuron、Triton — 10 以上のエンジンを PagedAttention・Continuous Batching・Speculative Decoding・Disaggregated Inference・KV 量子化・NIM・Groq LPU の観点から横並びで比較する。セルフホスト ROI 計算と日韓の推論インフラまで。</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm-inference</category><category>vllm</category><category>sglang</category><category>llama-cpp</category><category>tgi</category><category>tensorrt-llm</category><category>mlx</category><category>mistral-rs</category><category>deepspeed</category><category>aphrodite</category><category>inference</category><category>日本語</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-ai-inference-engines-2026-vllm-sglang-llama-cpp-tgi-tensorrt-llm-mlx-mistralrs-deepspeed-aphrodite-deep-dive</guid>
    <title>AI 추론 엔진 2026 완벽 가이드 - vLLM · SGLang · llama.cpp · TGI · TensorRT-LLM · MLX · mistral.rs · DeepSpeed-MII · Aphrodite 심층 분석</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-ai-inference-engines-2026-vllm-sglang-llama-cpp-tgi-tensorrt-llm-mlx-mistralrs-deepspeed-aphrodite-deep-dive</link>
    <description>2026년 LLM 추론은 더 이상 모델 선택의 문제가 아니라 엔진 선택의 문제가 됐다. vLLM V1, SGLang 0.4, TensorRT-LLM, TGI 3.x, llama.cpp, MLX-LM, mistral.rs, DeepSpeed-MII, Aphrodite, CTranslate2, ExLlamaV3, OpenVINO, AWS Neuron, Triton — 10개+ 엔진을 PagedAttention·Continuous Batching·Speculative Decoding·Disaggregated Inference·KV 양자화·NIM·Groq LPU 관점에서 한 줄로 비교한다. 자가 호스팅 ROI 계산과 한국·일본 추론 인프라까지.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>llm-inference</category><category>vllm</category><category>sglang</category><category>llama-cpp</category><category>tgi</category><category>tensorrt-llm</category><category>mlx</category><category>mistral-rs</category><category>deepspeed</category><category>aphrodite</category><category>inference</category>
  </item>

    </channel>
  </rss>
