Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Intro — In May 2026, MLOps and LLMOps share the same surface

Until 2024 the words "MLOps" and "LLMOps" were used as if they meant different things. As of May 2026 the two disciplines **share almost the same surface**. Experiment trackers log prompt runs. Model registries version LoRA adapters. Serving platforms ship vLLM and TGI as first-class backends. Monitoring tools watch data drift alongside hallucination rate, toxicity, and faithfulness on the same dashboard.

This article is not a platform-of-the-month matrix. It is **what production ML teams actually run in 2026**, told through 30 tools. We hold OSS-first stacks, managed cloud stacks, "rent only GPUs" serverless stacks, and the in-house platforms inside Korean and Japanese big tech to the same standard.

The 2026 MLOps stack broken into 8 layers

Big picture first. The 2026 standard MLOps stack decomposes into 8 layers.

1. **Experiment tracking**: runs, params, metrics, artifacts

2. **Model registry**: versions, stages, model cards

3. **Pipeline orchestration**: DAGs, caching, retries

4. **Training infra**: GPU scheduling, distributed training, HPO

5. **Model serving**: online / batch / edge

6. **Data versioning**: datasets, features, indexes

7. **Monitoring**: data drift, concept drift, prediction drift

8. **LLM eval**: faithfulness, toxicity, jailbreak detection

The era when one or two tools owned each layer is over. Inside the same layer, a **classic ML track and an LLM track** now diverge.

Experiment tracking — MLflow 3, Weights & Biases, Comet, Neptune.ai

Two tools own 90% of the experiment tracking layer: **MLflow 3.0** and **Weights & Biases**. MLflow is BSD-licensed OSS built by Databricks and donated to the Linux Foundation. In 3.0 GenAI tracing, eval, and a prompt registry became first-class citizens. W&B is SaaS-first with open SDKs, and bundles Models / Weave / Launch / Sweeps in one product.

The challengers are credible. **Comet ML** unifies ML + LLM experiments + production monitoring in one SaaS, **Neptune.ai** repositioned around foundation-model training metadata. In OSS-self-hosted territory, **Aim** and **ClearML** remain solid.

A typical MLflow 3 snippet looks like this.

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import load_iris

mlflow.set_tracking_uri("http://mlflow.internal:5000")

mlflow.set_experiment("iris-rf-2026")

X, y = load_iris(return_X_y=True)

with mlflow.start_run() as run:

mlflow.log_param("n_estimators", 200)

model = RandomForestClassifier(n_estimators=200).fit(X, y)

mlflow.log_metric("train_acc", model.score(X, y))

mlflow.sklearn.log_model(

model,

artifact_path="model",

registered_model_name="iris-rf",

)

print(run.info.run_id)

The same flow in W&B.

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import load_iris

run = wandb.init(project="iris-rf-2026", config={"n_estimators": 200})

X, y = load_iris(return_X_y=True)

model = RandomForestClassifier(n_estimators=200).fit(X, y)

wandb.log({"train_acc": model.score(X, y)})

artifact = wandb.Artifact("iris-rf", type="model")

artifact.add_file("model.pkl")

run.log_artifact(artifact)

wandb.finish()

The split is about storage and governance. MLflow self-hosts by default and has a friendly license but a plain UI. W&B has the most polished collaboration UX and an integrated LLM eval suite but bills add up quickly.

ClearML, DagsHub, MLEM — integrated tools from the OSS camp

ClearML bundles experiment tracking + orchestration + data management + serving in one OSS package. It uses an agent-worker pattern: jobs land in a queue and GPU nodes pick them up. Small-to-medium teams who want "one tool only" choose it often.

DagsHub stitches **DVC + MLflow + Git LFS** into one SaaS UI. Looking at data versioning and experiment tracking on the same screen is its main pitch. MLEM by Iterative.ai is a packaging tool that exports scikit-learn / PyTorch models to a FastAPI service in a single line.

Kubeflow 1.10 — full-stack ML on top of K8s

Kubeflow is not one tool. It is a full stack on top of K8s. The 1.10 lineup as of May 2026 ships with these core components.

- **Kubeflow Pipelines (KFP)**: DAG pipelines SDK + UI

- **Katib**: distributed hyperparameter tuning

- **KServe**: model serving (InferenceService CRD)

- **Training Operator**: PyTorch / TensorFlow / MPI distributed training

- **Spark Operator, Notebook Controller, Central Dashboard**

A KFP pipeline is written in Python and compiled to YAML.

from kfp import dsl, compiler

@dsl.component(base_image="python:3.11")

def preprocess(input_path: str, output_path: dsl.OutputPath(str)) -> None:

df = pd.read_csv(input_path)

df.to_parquet(output_path)

@dsl.component(base_image="ghcr.io/myorg/trainer:1.4")

def train(data_path: dsl.InputPath(str), model_path: dsl.OutputPath(str)) -> None:

from sklearn.ensemble import RandomForestClassifier

df = pd.read_parquet(data_path)

model = RandomForestClassifier().fit(df.drop("y", axis=1), df["y"])

joblib.dump(model, model_path)

@dsl.pipeline(name="iris-pipeline")

def iris_pipeline(input_path: str) -> None:

p = preprocess(input_path=input_path)

train(data_path=p.output)

compiler.Compiler().compile(iris_pipeline, "iris.yaml")

The hidden cost is K8s itself. Without **EKS / GKE / AKS** managed Kubeflow, a dedicated platform team is effectively a hard requirement.

Metaflow, Flyte, ZenML, Prefect ML, Airflow ML

If K8s-native is too heavy, teams pick Python-first tools. **Metaflow** was built at Netflix and commercialized by Outerbounds. It is deeply integrated with AWS Batch + Step Functions and lets you declare a GPU step with one decorator. **Flyte** came out of Lyft and is now commercialized by Union.ai. K8s-native plus strong typing is the pitch.

**ZenML** positions itself as a "framework-agnostic abstraction layer." It wraps MLflow / W&B / Kubeflow / Airflow / SageMaker as backends behind the same code. **Prefect 3 + Prefect ML** unifies data pipelines and ML pipelines in one UI. **Airflow 3** added ML-friendly decorators (@task.virtualenv, @task.kubernetes) but is still not an ML-first tool.

Vertex AI — Google Cloud's fully managed ML

GCP's Vertex AI binds training + tuning + serving + pipelines + feature store + monitoring into a single console. It branches into Custom Training, AutoML, Workbench (notebooks), Pipelines (KFP-backed), Endpoints, Model Garden, Vertex AI Agent Builder, Vertex AI Search and more.

A Vertex AI Custom Job looks like this.

from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

job = aiplatform.CustomContainerTrainingJob(

display_name="iris-rf-2026",

container_uri="us-central1-docker.pkg.dev/my-project/ml/iris-trainer:1.4",

model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",

)

model = job.run(

args=["--n_estimators=200"],

replica_count=1,

machine_type="n1-standard-8",

accelerator_type="NVIDIA_TESLA_T4",

accelerator_count=1,

)

Vertex's strengths are natural integration with BigQuery + Dataplex + Looker, lessons learned from Gemini training infrastructure, and **Model Garden's 70+ base models**. Its weaknesses are GCP lock-in and instance prices that run noticeably above self-hosted equivalents.

SageMaker — AWS's ML mountain range

AWS SageMaker is not a single service. It is a mountain range. As of May 2026 it splits as follows.

- **SageMaker Studio**: integrated IDE

- **SageMaker Training / Processing Jobs**: batch training

- **SageMaker Pipelines**: pipeline orchestration

- **SageMaker Endpoints (real-time, async, serverless)**: serving

- **SageMaker Feature Store**: feature store

- **SageMaker Model Monitor, Clarify**: drift + bias

- **SageMaker JumpStart**: pretrained model catalog

- **SageMaker HyperPod**: foundation-model training clusters

Deploying an endpoint is a single block of code.

from sagemaker import Session

from sagemaker.sklearn import SKLearnModel

session = Session()

role = "arn:aws:iam::123456789012:role/SageMakerRole"

model = SKLearnModel(

model_data="s3://my-bucket/iris-rf/model.tar.gz",

role=role,

entry_point="inference.py",

framework_version="1.4-1",

sagemaker_session=session,

)

predictor = model.deploy(

initial_instance_count=1,

instance_type="ml.m5.large",

endpoint_name="iris-rf-endpoint",

)

print(predictor.predict([[5.1, 3.5, 1.4, 0.2]]))

The strengths are **AWS integration** (S3, IAM, VPC, CloudWatch, EventBridge) and deep automation. The weaknesses are instance prices 30–50% above EC2 equivalents and the separate work of tracking JumpStart model licenses.

Azure ML Studio + Microsoft Fabric

Azure Machine Learning Studio sits in the same position as SageMaker / Vertex AI. It ships Workspaces, Compute Clusters, Pipelines, Endpoints (Online + Batch), Model Catalog, Prompt Flow, and the Responsible AI Dashboard. In 2026 **Microsoft Fabric + Azure ML integration** has tightened so OneLake data can feed training directly. Azure OpenAI Service is separate but callable from Prompt Flow.

Deep integration with the Microsoft enterprise stack (Entra ID, Purview, Defender for Cloud) is the upside. The downside is a less consistent UI and a steep learning curve around self-signed certs and private link configuration.

Databricks ML + Mosaic AI — ML on top of the lakehouse

Databricks bundles **Unity Catalog + MLflow + Delta Lake** into one offering and pitches "lakehouse ML." After the 2024 Mosaic AI acquisition the platform added Mosaic AI Pretraining, Mosaic AI Vector Search, and Mosaic AI Model Serving.

- **Model Serving**: unified GPU + CPU endpoints, Provisioned Throughput

- **AI Gateway**: LLM call routing + cost caps

- **Mosaic AI Agent Framework**: agent build + eval

- **Genie Spaces**: NLQ to SQL

For teams whose data is already on Databricks, the bang-for-buck against lock-in is the best in the market.

Hugging Face Hub + Inference Endpoints + AutoTrain

Hugging Face went from a model hub to a full-stack MLOps offering. The May 2026 lineup looks like this.

- **Hub**: 1.3M+ models, datasets, and spaces

- **Inference Endpoints**: managed GPU/CPU serving (pick AWS / Azure / GCP region)

- **Inference Providers**: routed access to Together / Fireworks / Replicate

- **AutoTrain**: no-code fine-tuning UI

- **Spaces**: Gradio / Streamlit app hosting

- **TGI (Text Generation Inference)**: LLM serving backend

Deploying an Inference Endpoint looks like this.

from huggingface_hub import create_inference_endpoint

endpoint = create_inference_endpoint(

name="llama-3-8b-prod",

repository="meta-llama/Meta-Llama-3-8B-Instruct",

framework="pytorch",

accelerator="gpu",

instance_size="x1",

instance_type="nvidia-a10g",

region="us-east-1",

vendor="aws",

type="protected",

)

endpoint.wait()

print(endpoint.url)

The strength is putting OSS models and managed serving behind one UI. The weakness is shorter enterprise SLAs than hyperscalers and roughly 30% higher dedicated instance cost than RunPod / Modal.

Determined AI, Anyscale + Ray Train — distributed GPU training

Teams running their own GPU clusters reach for distributed-training tools. **Determined AI** (HPE acquisition in 2021) is an OSS platform that bundles distributed training + HPO + experiment tracking. Install agents on nodes and the GPU pool schedules automatically.

**Anyscale** is the managed SaaS from the Ray maintainers. Ray Train, Ray Tune, Ray Serve, and Ray Data all run on the same cluster. Bringing up PyTorch DDP through Ray Train is one function.

from ray.train.torch import TorchTrainer

from ray.train import ScalingConfig

def train_fn(config):

from torch.utils.data import DataLoader, TensorDataset

model = nn.Linear(10, 1)

opt = torch.optim.SGD(model.parameters(), lr=config["lr"])

ds = TensorDataset(torch.randn(1024, 10), torch.randn(1024, 1))

dl = DataLoader(ds, batch_size=32)

for epoch in range(config["epochs"]):

for x, y in dl:

loss = ((model(x) - y) ** 2).mean()

opt.zero_grad(); loss.backward(); opt.step()

trainer = TorchTrainer(

train_fn,

train_loop_config={"lr": 1e-3, "epochs": 10},

scaling_config=ScalingConfig(num_workers=4, use_gpu=True),

)

result = trainer.fit()

Ray now anchors not only LLM training but also **agent orchestration** (parallelizing many LLM calls), so it has become core infrastructure for the LLMOps stack.

BentoML — Python-first model serving

BentoML is an OSS framework that packages Python models into containers and ships them to managed BentoCloud. The 1.4 lineup officially supports an LLM serving mode (`bentoml serve --reload --use-vllm`) and multi-model routing.

A service definition is one file.

from bentoml import api

from sklearn.ensemble import RandomForestClassifier

@bentoml.service(resources={"cpu": "2", "memory": "1Gi"}, traffic={"timeout": 30})

class IrisRF:

def __init__(self) -> None:

self.model: RandomForestClassifier = joblib.load("model.pkl")

@api

def predict(self, x: list[list[float]]) -> list[int]:

return self.model.predict(x).tolist()

BentoML deploys to K8s directly or to managed BentoCloud. Compared with KServe / Seldon Core 2 / Triton, it has the friendliest Python ergonomics.

Modal, RunPod, Replicate, Fireworks AI, Together AI — serverless GPUs

Teams that want to rent GPUs and have infrastructure disappear pick serverless GPU platforms.

- **Modal**: Python-decorator-first serverless. Cold start within a second.

- **RunPod**: GPU pods + serverless endpoints. A100/H100/H200 at roughly 50% of hyperscaler prices.

- **Replicate**: one-line API to OSS models, cog-format packaging.

- **Fireworks AI**: optimized OSS LLM serving. Sub-second TTFT for Llama/Mixtral/Qwen.

- **Together AI**: OSS LLM hosting + custom model serving + fine-tuning.

- **Lamini, Predibase**: enterprise fine-tuning + serving (LoRA serving).

Spinning up a Modal GPU function looks like this.

app = modal.App("llama-inference")

image = (

modal.Image.debian_slim()

.pip_install("torch==2.4.0", "transformers==4.44.0", "vllm==0.6.0")

)

@app.function(image=image, gpu="A100-40GB", timeout=300)

def generate(prompt: str) -> str:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Meta-Llama-3-8B-Instruct")

out = llm.generate([prompt], SamplingParams(max_tokens=200, temperature=0.7))

return out[0].outputs[0].text

@app.local_entrypoint()

def main():

print(generate.remote("MLOps in 2026 means"))

LLM serving backends — vLLM, SGLang, TGI, TensorRT-LLM

LLM serving is its own category. Four engines split the market as of May 2026.

- **vLLM**: inventor of PagedAttention. Still the easiest fast starting point. 0.6.x stabilized multi-LoRA concurrent serving.

- **SGLang**: the generation after vLLM. RadixAttention optimizes prefix cache.

- **TGI (Hugging Face)**: HF-model-friendly. The Inference Endpoints backend.

- **TensorRT-LLM**: NVIDIA-built. Lowest latency on H100 / H200.

BentoML 1.4 and KServe 0.13 can both plug any of the four as a backend. The OpenAI-compatible server in vLLM comes up with one command.

python -m vllm.entrypoints.openai.api_server \

--model meta-llama/Meta-Llama-3-8B-Instruct \

--tensor-parallel-size 1 \

--gpu-memory-utilization 0.9 \

--max-model-len 8192 \

--enable-prefix-caching \

--enable-lora \

--max-lora-rank 32 \

--port 8000

Model monitoring — Arize, WhyLabs, Fiddler, TruEra, Galileo, Evidently

Training-serving skew, data drift, concept drift, prediction drift, plus LLM hallucination — all watched together.

- **Arize AI**: unified ML + LLM monitoring. Phoenix (OSS) + managed SaaS.

- **WhyLabs**: WhyLogs (OSS) based. Self-host friendly.

- **Fiddler**: ML monitoring + explainability + RAG eval.

- **TruEra**: maintainer of TruLens (LLM eval OSS). Acquired by Snowflake in 2024.

- **Galileo**: LLM-only eval and guardrails.

- **Evidently AI**: OSS-friendly. Report-style output.

Same layer, different strengths. Classic ML picks Arize / WhyLabs / Fiddler, LLM-only picks Galileo / Arize Phoenix / TruLens, OSS self-host picks Evidently / WhyLogs / Phoenix.

The LLM eval revolution — DeepEval, RAGAS, Promptfoo, langwatch

LLM eval has split off into its own stack.

- **DeepEval**: pytest-friendly LLM eval OSS. Faithfulness, answer relevancy, bias, toxicity.

- **RAGAS**: de facto standard for RAG pipeline eval. Context recall, faithfulness, context relevancy.

- **Promptfoo**: prompt A/B testing, CLI-friendly.

- **langwatch**: prompt observability + eval + cost tracking.

- **Argilla**: human-in-the-loop labeling + eval. Acquired by HF in 2024.

For LLM model cards and dataset cards (datasheets), Hugging Face Hub has become the de facto format.

Data versioning — DVC, lakeFS, Pachyderm

- **DVC**: Git-friendly data versioning by Iterative.ai. S3 / GCS / Azure backends.

- **lakeFS**: Git-like branch / merge for object stores.

- **Pachyderm**: pipelines + data lineage. Acquired by HPE in 2023.

- **DagsHub**: integrated SaaS for DVC + MLflow + Git LFS.

Data versioning covers training reproducibility, GDPR deletion-request tracking, and dataset lineage as one workflow.

Hyperparameter optimization — Optuna, Ray Tune, W&B Sweeps, Katib

Optuna is the de facto OSS HPO standard. Swap TPE, CMA-ES, or NSGA-II with one argument. Ray Tune runs distributed HPO on a Ray cluster. W&B Sweeps drives grid / random / bayes searches inside W&B. Katib does the same work on K8s.

study:

storage: postgresql+psycopg2://user:pw@db/optuna

direction: maximize

sampler:

type: tpe

n_startup_trials: 20

pruner:

type: hyperband

parameters:

- name: learning_rate

type: float

low: 1e-5

high: 1e-2

log: true

- name: hidden_size

type: categorical

choices: [128, 256, 512]

CI/CD for ML — Continuous Training (CT)

Traditional CI/CD gains one extra leg. **Continuous Training (CT)** kicks off automatic retrain → eval → promote whenever new data lands.

Core patterns.

- **Shadow deployment**: run a new model on real traffic without serving it; just collect logs.

- **Canary deployment**: ramp from 5%, 25%, 50% to 100% traffic.

- **Blue-green for ML**: keep two model versions warm, switch by router.

- **A/B testing**: branch models by user segment.

- **Automated rollback**: revert automatically when monitoring SLOs trip.

Both KServe and BentoCloud support canary / shadow as first-class concepts on the InferenceService.

Edge / embedded inference — TF Lite, Core ML, ONNX Runtime, OpenVINO, TVM

- **TensorFlow Lite**: Android / microcontrollers

- **Core ML**: Apple Silicon, iOS / macOS / visionOS

- **ONNX Runtime**: cross-platform runtime (CPU, CUDA, DirectML, CoreML, OpenVINO EP)

- **OpenVINO**: Intel-friendly. CPU / iGPU / NPU acceleration.

- **Apache TVM**: compiler-first. Pairs with MLC LLM for on-device LLM inference.

- **MediaPipe**: multimedia ML pipelines on mobile.

The closer you get to the edge, the more quantization (INT8 / INT4 / GPTQ / AWQ), pruning, and distillation become mandatory.

GPU cost comparison — hourly rates as of May 2026

The single question we get most is GPU pricing. The same H100 80GB lands at 2–3x different prices across platforms.

| Platform | A100 80GB | H100 80GB | Commitment / notes |

| --- | --- | --- | --- |

| GCP A3 (8x H100) | n/a | ~$5.5/hr | Significant CUD discount on commit |

| Azure NDv5 (H100) | n/a | ~$5.4/hr | 30–50% off with 1-yr reserved |

| CoreWeave | $1.65/hr | ~$4.25/hr | Separate spot discounts |

| Lambda Labs | $1.29/hr | ~$2.49/hr | On-demand, 4090 / A6000 also cheap |

| RunPod (Community) | $1.19/hr | ~$1.99/hr | High availability variance |

| Modal | n/a | ~$4.0/hr | Per-second billing, cold start within a second |

Real prices vary heavily by region, commitment, and spot status, so always check the vendor's official page. The trend is clear though: **OSS GPU brokers (CoreWeave / Lambda / RunPod) run 40–60% cheaper than hyperscalers.**

Korea — Naver HyperCLOVA, Coupang ML, Kakao Brain

Based on what Korea's three big-tech vendors disclose publicly, their internal MLOps looks like this.

- **Naver Cloud / HyperCLOVA X**: large-scale training on in-house GPU clusters + Megatron-LM / Slurm. Serving uses an internal model-serving platform + Triton + vLLM.

- **Coupang ML platform**: known internally as "Bumblebee," running KFP / MLflow pipelines + a custom feature store.

- **Kakao Brain**: their SoftCo cluster trains models like Karlo and serves them via internal Brain Cloud. PyTorch + DeepSpeed + Slurm.

- **LG AI Research (EXAONE series)**: in-house cluster + NVIDIA DGX + custom training framework.

Universities and research institutes are increasingly using public GPU resources such as NIPA / NCDS's K-Cloud / KSC-V.

Japan — Preferred Networks PFCC, Mercari, CyberAgent, ABEJA

In Japan, four companies have shaped the in-house MLOps standard.

- **Preferred Networks**: own MN-3 / MN-Core supercomputers; PyTorch + the PFCC successor to ChainerMN.

- **Mercari**: internal "Mercari ML Platform" built on KFP + Vertex AI + a custom feature store.

- **CyberAgent AI Lab**: CIU (CyberAgent's ML platform) + a private LLM (CyberAgentLM) training stack.

- **ABEJA Platform**: enterprise MLOps SaaS, unifying annotation + training + serving.

- **DeNA, Yahoo Japan, Rakuten**: each runs KFP + custom feature store + custom model registry.

A common thread: **heavy dependence on Vertex AI**. Mercari and CyberAgent share GCP-friendly stacks in public blogs.

Stack recommendations — four scenarios

- **Small startup (3–10 people)**: Modal + W&B + Hugging Face Hub + Evidently. Rent only the GPUs.

- **Mid-size ML team (10–50 people)**: SageMaker or Vertex AI + MLflow + DVC + Arize. Managed full stack.

- **Teams running their own GPU clusters (50+)**: Kubeflow 1.10 + MLflow + Ray + KServe + Argilla + Galileo. Self-hosted full stack.

- **LLM-first teams**: vLLM + BentoML + Argilla + DeepEval + RAGAS + langwatch + Modal. Eval and cost tracking dominate.

There is no single right answer. What is certain is that **drawing layer boundaries deliberately** keeps migration costs small whenever you swap a tool inside a layer.

Closing — MLOps is ultimately "reproducibility and recoverability"

To compress the May 2026 MLOps landscape into one line: "reproducibility and recoverability are first-class citizens." Which data and which code trained the model, what traffic the model now sees, and how quickly we can roll back when something breaks are the real evaluation axes.

Tools can be swapped. **But layer boundaries and ownership lines, once drawn wrong, cost 6–12 months to redraw.** I hope this article helps you draw those boundaries with intent.

References

- [MLflow](https://mlflow.org/) — Apache 2.0 ML lifecycle platform

- [Kubeflow](https://www.kubeflow.org/) — Full-stack ML on K8s

- [Weights & Biases](https://wandb.ai/) — Experiment tracking + LLM ops

- [Neptune.ai](https://neptune.ai/) — Foundation-model metadata store

- [Comet ML](https://www.comet.com/) — Integrated tracking + monitoring SaaS

- [Google Vertex AI](https://cloud.google.com/vertex-ai) — GCP ML platform

- [AWS SageMaker](https://aws.amazon.com/sagemaker/) — AWS ML platform

- [Azure Machine Learning](https://learn.microsoft.com/azure/machine-learning/) — Microsoft ML platform

- [Databricks Machine Learning](https://www.databricks.com/product/machine-learning) — Lakehouse ML + Mosaic AI

- [Hugging Face](https://huggingface.co/docs) — Hub + Inference Endpoints + AutoTrain

- [BentoML](https://www.bentoml.com/) — Python model serving

- [Modal](https://modal.com/) — Serverless GPUs

- [Anyscale](https://www.anyscale.com/) — Managed Ray

- [Ray](https://www.ray.io/) — Distributed compute + ML

- [ZenML](https://github.com/zenml-io/zenml) — MLOps abstraction layer

- [Flyte](https://flyte.org/) — K8s-native workflows

- [DVC](https://dvc.org/) — Git-friendly data versioning

- [lakeFS](https://lakefs.io/) — Object-store branch / merge

- [Arize AI](https://arize.com/) — ML + LLM monitoring

- [WhyLabs](https://whylabs.ai/) — Monitoring on top of OSS WhyLogs

- [Galileo](https://www.galileo.ai/) — LLM eval + guardrails

- [RAGAS](https://docs.ragas.io/) — RAG eval standard

- [vLLM](https://docs.vllm.ai/) — LLM serving engine