Model-serving

All Posts

Published on
2026년 5월 16일
MLOps 플랫폼 2026 완전판 - MLflow · Kubeflow · W&B · Vertex AI · SageMaker · Databricks · BentoML · Ray · Modal · Hugging Face 심층 가이드
mlops mlflow kubeflow weights-and-biases vertex-ai sagemaker databricks bentoml ray modal huggingface model-serving experiment-tracking llm-ops
2026년 5월 기준 MLOps 30여 개 플랫폼을 한 번에 비교한다. MLflow 3, Kubeflow 1.10, Weights & Biases, Comet, Neptune.ai, ClearML, Vertex AI, SageMaker, Azure ML, Databricks ML + Mosaic AI, Hugging Face Inference Endpoints, Determined, Anyscale + Ray Train, BentoML, Modal, RunPod, Replicate, Fireworks AI, Together AI, Lamini, Predibase, Argilla, Galileo, Arize, WhyLabs, Fiddler, TruEra, DagsHub, DVC, lakeFS, ZenML, Metaflow, Flyte, Prefect ML, Airflow ML까지 — 실험 추적·모델 레지스트리·서빙·드리프트 모니터링·LLM eval·GPU 비용까지 한 글로 정리한다.
Published on
2026년 4월 15일
MLOps 완전 가이드 — 모델 서빙·Feature Store·Drift·A/B 테스트·GPU 경제학 (Season 2 Ep 7, 2025)
mlops model-serving feature-store drift-detection ab-testing gpu-economics vllm triton mlflow ray kubernetes season-2
모델을 학습하는 것과 프로덕션에서 운영하는 것은 완전히 다른 게임이다. Serving(TorchServe·Triton·vLLM·TGI), Feature Store(Feast·Tecton), Training Infra(Ray·Determined), Experiment Tracking(MLflow·W&B), Data/Concept Drift 감지, Model A/B 테스트와 Shadow Deployment, 그리고 GPU 경제학(on-demand·spot·자체 구매)까지 — "논문에서 프로덕션까지의 거리"를 메우는 실전 MLOps 한 편. Season 2의 일곱 번째.
Published on
2026년 4월 13일
Feature Store & MLOps 파이프라인 완전 가이드 2025: Feast, Feature Engineering, 모델 서빙
feature-store mlops feast feature-engineering model-serving ml-pipeline kubeflow mlflow data-pipeline 2026-04 2026-04-13
Feature Store와 MLOps의 모든 것! Feature Store 아키텍처(Feast/Tecton/Hopsworks), Feature Engineering 패턴, MLOps 파이프라인(학습→검증→배포→모니터링), 모델 서빙(BentoML/Seldon/TFServing), 모델 레지스트리(MLflow), 드리프트 감지, A/B 테스트.
Published on
2026년 3월 17일
AI 모델 서빙과 추론 최적화 완전 가이드: vLLM, TensorRT, Triton, Ollama
mlops model-serving vllm tensorrt triton inference optimization 2026-03 2026-03-17
AI 모델을 프로덕션에서 효율적으로 서빙하는 완전 가이드. vLLM, TensorRT, NVIDIA Triton Inference Server, Ollama, 양자화(INT8/INT4), 배치 처리, 지연 최적화까지 실전 예제로 마스터합니다.
Published on
2026년 3월 12일
KServe 모델 서빙 완벽 가이드: InferenceService·Canary 배포·Transformer·InferenceGraph 프로덕션 운영
ai-platform kserve model-serving kubernetes inference-graph canary mlops
KServe를 활용한 Kubernetes 기반 모델 서빙을 다룹니다. InferenceService CRD로 모델 배포, Canary 전략으로 안전한 롤아웃, Transformer로 전후처리 파이프라인, InferenceGraph로 DAG 기반 복합 추론까지 프로덕션 운영 전략을 코드와 함께 구현합니다.
Published on
2026년 3월 9일
Ray Serve 모델 서빙 플랫폼 구축 가이드 — 오토스케일링, 멀티모델, 프로덕션 배포
ai-platform ray-serve model-serving kuberay mlops 2026-03-09
Ray Serve의 아키텍처, LLM 모델 서빙 배포, 오토스케일링, 멀티모델 패턴, KubeRay 운영을 실전 코드와 함께 총정리합니다.
Published on
2026년 3월 8일
NVIDIA Triton Inference Server 프로덕션 가이드: GPU 모델 서빙 최적화 전략
ai-platform triton inference-server gpu model-serving nvidia 2026-03 2026-03-08
NVIDIA Triton Inference Server를 활용한 GPU 모델 서빙 최적화 가이드. Dynamic Batching, Model Ensemble, TensorRT 통합, 멀티 모델 서빙, Kubernetes 배포, 성능 프로파일링과 프로덕션 트러블슈팅까지 다룹니다.
Published on
2026년 3월 6일
vLLM PagedAttention 기반 LLM 프로덕션 서빙 최적화와 추론 엔진 비교 가이드
llm vllm pagedattention inference-serving model-serving 2026-03 2026-03-06
vLLM의 PagedAttention 알고리즘부터 프로덕션 배포, 성능 튜닝, SGLang·TensorRT-LLM과의 비교, Kubernetes 연동까지 다루는 LLM 서빙 종합 가이드.
Published on
2026년 3월 3일
BentoML로 ML 모델 서빙 파이프라인 구축하기: 패키징부터 Kubernetes 배포까지
ai-platform bentoml model-serving mlops kubernetes 2026-03 2026-03-03
BentoML을 활용한 ML 모델 서빙을 실습합니다. 모델 패키징, API 구현, 멀티모델 파이프라인, Docker 빌드, Kubernetes 배포까지 핸즈온으로 다룹니다.
Published on
2026년 3월 3일
Ray Serve로 구현하는 확장 가능한 LLM 서빙 파이프라인
ai-platform ray-serve model-serving llm mlops march-2026 2026-03-03
Ray Serve를 활용한 ML/LLM 모델 서빙의 핵심 개념부터 멀티모델 파이프라인, 오토스케일링, 배치 추론, 프로덕션 배포까지 코드 예제와 함께 다룹니다.
Published on
2026년 3월 1일
vLLM & Ollama 완벽 가이드: LLM 서빙 엔진의 구동, 파라미터, 환경변수 총정리
vllm ollama llm-serving inference model-serving gpu quantization openai-api deep-learning devops
vLLM의 PagedAttention 아키텍처와 Ollama의 로컬 LLM 실행 환경을 심층 비교한다. 설치부터 서버 구동, API 호출, 주요 CLI 인자, 샘플링 파라미터, 환경변수, 양자화(AWQ/GPTQ/GGUF), 멀티 GPU 구성, Docker 배포, 성능 튜닝까지 — LLM 서빙에 필요한 모든 설정을 실전 예제와 함께 총정리한다.
Published on
2026년 3월 1일
Kubernetes ML 모델 서빙: KServe와 NVIDIA Triton 완전 분석
mlops kubernetes model-serving kserve triton
KServe와 NVIDIA Triton 공식 문서를 기반으로 Kubernetes 환경에서의 ML 모델 서빙 아키텍처를 체계적으로 분석한다.

Model-serving

model-serving (12)