- Authors
- Name
- What is MLflow?
- Installation and Server Setup
- Experiment Tracking
- Model Registry
- Model Serving
- Experiment Comparison and Analysis
- Production Checklist
What is MLflow?
MLflow is an open-source platform for managing the ML lifecycle. It consists of four core components:
- MLflow Tracking: Records experiment parameters, metrics, and artifacts
- MLflow Projects: Packages ML code for reproducibility
- MLflow Models: Packages models from various frameworks in a unified format
- MLflow Model Registry: Model version management and deployment workflows
Installation and Server Setup
Basic Installation
# pip installation
pip install mlflow
# Additional framework support
pip install mlflow[extras] # sklearn, tensorflow, pytorch, etc.
# Start server (local)
mlflow server --host 0.0.0.0 --port 5000
# Production server with PostgreSQL + S3 backend
mlflow server \
--backend-store-uri postgresql://mlflow:password@localhost:5432/mlflow \
--default-artifact-root s3://mlflow-artifacts/ \
--host 0.0.0.0 --port 5000
Deployment with Docker Compose
# docker-compose.yml
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.18.0
ports:
- '5000:5000'
environment:
- MLFLOW_BACKEND_STORE_URI=postgresql://mlflow:password@postgres:5432/mlflow
- MLFLOW_DEFAULT_ARTIFACT_ROOT=s3://mlflow-artifacts/
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
command: >
mlflow server
--backend-store-uri postgresql://mlflow:password@postgres:5432/mlflow
--default-artifact-root s3://mlflow-artifacts/
--host 0.0.0.0 --port 5000
depends_on:
- postgres
postgres:
image: postgres:16
environment:
POSTGRES_USER: mlflow
POSTGRES_PASSWORD: password
POSTGRES_DB: mlflow
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Experiment Tracking
Basic Usage
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score
# Configure tracking server
mlflow.set_tracking_uri("http://localhost:5000")
# Create/set experiment
mlflow.set_experiment("iris-classification")
# Prepare data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Run experiment
with mlflow.start_run(run_name="rf-baseline"):
# Log parameters
params = {
"n_estimators": 100,
"max_depth": 5,
"min_samples_split": 2,
"random_state": 42
}
mlflow.log_params(params)
# Train model
model = RandomForestClassifier(**params)
model.fit(X_train, y_train)
# Predictions and metrics
y_pred = model.predict(X_test)
metrics = {
"accuracy": accuracy_score(y_test, y_pred),
"f1_macro": f1_score(y_test, y_pred, average="macro"),
"precision_macro": precision_score(y_test, y_pred, average="macro")
}
mlflow.log_metrics(metrics)
# Tags
mlflow.set_tag("model_type", "random_forest")
mlflow.set_tag("dataset", "iris")
# Save model
mlflow.sklearn.log_model(
model,
artifact_path="model",
registered_model_name="iris-classifier"
)
# Custom artifacts (plots, reports, etc.)
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
cm = confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots()
ConfusionMatrixDisplay(cm).plot(ax=ax)
fig.savefig("confusion_matrix.png")
mlflow.log_artifact("confusion_matrix.png")
print(f"Run ID: {mlflow.active_run().info.run_id}")
print(f"Metrics: {metrics}")
Hyperparameter Tuning Tracking
import optuna
import mlflow
def objective(trial):
params = {
"n_estimators": trial.suggest_int("n_estimators", 50, 500),
"max_depth": trial.suggest_int("max_depth", 2, 20),
"min_samples_split": trial.suggest_int("min_samples_split", 2, 10),
"min_samples_leaf": trial.suggest_int("min_samples_leaf", 1, 5),
}
with mlflow.start_run(nested=True, run_name=f"trial-{trial.number}"):
mlflow.log_params(params)
model = RandomForestClassifier(**params, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
mlflow.log_metric("accuracy", accuracy)
return accuracy
# Run Optuna study
with mlflow.start_run(run_name="hyperparameter-tuning"):
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)
# Log best results
mlflow.log_params(study.best_params)
mlflow.log_metric("best_accuracy", study.best_value)
mlflow.set_tag("best_trial", study.best_trial.number)
PyTorch Model Tracking
import torch
import torch.nn as nn
import mlflow.pytorch
class SimpleNet(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
return self.fc2(self.relu(self.fc1(x)))
with mlflow.start_run(run_name="pytorch-model"):
model = SimpleNet(4, 32, 3)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
mlflow.log_params({
"hidden_dim": 32,
"learning_rate": 0.001,
"optimizer": "Adam",
"epochs": 100
})
for epoch in range(100):
# Training logic...
loss = criterion(model(X_tensor), y_tensor)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Log per-epoch metrics
mlflow.log_metric("train_loss", loss.item(), step=epoch)
# Save PyTorch model
mlflow.pytorch.log_model(model, "model")
Model Registry
Model Registration and Version Management
from mlflow import MlflowClient
client = MlflowClient()
# Register model (auto-registered when using registered_model_name in log_model)
# Or manually register:
result = client.create_registered_model(
name="iris-classifier",
description="Iris flower classification model"
)
# Register a specific run's model as a version
model_version = client.create_model_version(
name="iris-classifier",
source=f"runs:/{run_id}/model",
run_id=run_id,
description="RandomForest baseline v1"
)
print(f"Model Version: {model_version.version}")
Deployment Management with Aliases
# MLflow 2.x uses Aliases (Stage is deprecated)
client = MlflowClient()
# Set production alias
client.set_registered_model_alias(
name="iris-classifier",
alias="champion",
version=3
)
# Set challenger model
client.set_registered_model_alias(
name="iris-classifier",
alias="challenger",
version=5
)
# Load model by alias
champion_model = mlflow.pyfunc.load_model("models:/iris-classifier@champion")
challenger_model = mlflow.pyfunc.load_model("models:/iris-classifier@challenger")
# A/B testing
champion_pred = champion_model.predict(X_test)
challenger_pred = challenger_model.predict(X_test)
print(f"Champion accuracy: {accuracy_score(y_test, champion_pred)}")
print(f"Challenger accuracy: {accuracy_score(y_test, challenger_pred)}")
Using Model Tags
# Add tags to model version
client.set_model_version_tag(
name="iris-classifier",
version=3,
key="validation_status",
value="approved"
)
client.set_model_version_tag(
name="iris-classifier",
version=3,
key="approved_by",
value="data-science-lead"
)
# Search models by tag
from mlflow import search_model_versions
approved_versions = search_model_versions(
"name='iris-classifier' AND tag.validation_status='approved'"
)
Model Serving
Built-in MLflow Serving
# Local REST API serving
mlflow models serve \
-m "models:/iris-classifier@champion" \
--port 8080 \
--no-conda
# Test request
curl -X POST http://localhost:8080/invocations \
-H "Content-Type: application/json" \
-d '{"inputs": [[5.1, 3.5, 1.4, 0.2]]}'
Custom Serving with FastAPI
from fastapi import FastAPI
import mlflow.pyfunc
import numpy as np
app = FastAPI()
# Load model (once at server startup)
model = mlflow.pyfunc.load_model("models:/iris-classifier@champion")
@app.post("/predict")
async def predict(features: list[list[float]]):
predictions = model.predict(np.array(features))
return {
"predictions": predictions.tolist(),
"model_version": "champion"
}
@app.get("/health")
async def health():
return {"status": "healthy", "model": "iris-classifier@champion"}
Experiment Comparison and Analysis
Comparing in MLflow UI
# Search experiments (CLI)
mlflow runs list --experiment-id 1
# Search by metrics
mlflow runs list \
--experiment-id 1 \
--filter "metrics.accuracy > 0.95" \
--order-by "metrics.accuracy DESC"
Analysis with Python API
import mlflow
import pandas as pd
# Query all runs in an experiment
runs = mlflow.search_runs(
experiment_ids=["1"],
filter_string="metrics.accuracy > 0.9",
order_by=["metrics.accuracy DESC"],
max_results=10
)
# Analyze as DataFrame
print(runs[["run_id", "params.n_estimators", "params.max_depth", "metrics.accuracy"]])
# Find the best run
best_run = runs.iloc[0]
print(f"Best run: {best_run.run_id}, Accuracy: {best_run['metrics.accuracy']}")
Production Checklist
□ Set backend store to PostgreSQL/MySQL
□ Set artifact store to S3/GCS/MinIO
□ Configure authentication/authorization (OIDC, Basic Auth)
□ Set up automatic experiment logging (autolog)
□ Establish Model Registry alias conventions
□ Automate model validation in CI/CD
□ Configure model serving health checks
□ Define experiment cleanup policies (archive old runs)
Review Quiz (6 Questions)
Q1. What are the four core components of MLflow?
Tracking, Projects, Models, Model Registry
Q2. What is the difference between mlflow.log_params and mlflow.log_metrics?
log_params records training hyperparameters (strings), while log_metrics records performance metrics (numbers). Metrics support per-epoch tracking with the step parameter.
Q3. What concept is used for model deployment management in MLflow 2.x?
Aliases (e.g., @champion, @challenger). Stage has been deprecated.
Q4. When is the nested=True parameter used?
It is used when recording multiple child runs inside a parent run, such as during hyperparameter tuning.
Q5. Why use S3 as the artifact store?
It stores large artifacts like model files and plots in scalable object storage, making it easy to share across teams and manage versions.
Q6. What are the pros and cons of mlflow.autolog()?
Pros: Automatically records parameters/metrics/models without code changes. Cons: May record unnecessary information, and custom metrics still need to be logged separately.