MLOps Feature Store in Practice — Building a Feature Pipeline with Feast

Overview
Why You Need a Feature Store
- The Training-Serving Skew Problem
- Core Capabilities of a Feature Store
Installing Feast and Initializing a Project
Feature Definitions
Generating Sample Data
Feast Workflow
Managing Feature Groups with Feature Service
Real-time Feature Updates with Push Source
Feature Server Deployment
- Deploying Feature Server with Docker
Integration with Airflow (Automated Materialize)
Conclusion
Quiz

Overview

One of the most common problems when deploying ML models to production is Training-Serving Skew — a phenomenon where model performance degrades because the features used during training differ from those used during serving. A Feature Store is the infrastructure component that fundamentally solves this problem by centrally managing feature definitions, storage, and serving.

Feast (Feature Store) is the most widely used open-source feature store, supporting both offline (batch training) and online (real-time serving) paths. This article covers the entire process of building a feature pipeline with Feast.

Why You Need a Feature Store

The Training-Serving Skew Problem

# During training (offline)
features = pd.read_sql("""
    SELECT user_id,
           AVG(purchase_amount) as avg_purchase,
           COUNT(*) as purchase_count
    FROM transactions
    WHERE timestamp < '2026-01-01'
    GROUP BY user_id
""", conn)

# During serving (online) - Skew occurs when using different logic!
features = redis_client.get(f"user:{user_id}:features")

When training and serving compute the same features with different code, subtle discrepancies emerge, and model performance diverges from offline experiments. A Feature Store provides consistent values from a single feature definition for both offline and online use.

Core Capabilities of a Feature Store

Capability	Description
Feature Registry	Manages feature metadata, schemas, and ownership
Offline Store	Bulk feature retrieval for batch training (Point-in-Time Join)
Online Store	Low-latency feature retrieval for real-time serving
Feature Service	Serves features via gRPC/HTTP API
Point-in-Time Join	Joins exact feature values based on timestamps

Installing Feast and Initializing a Project

Installation

# Basic installation
pip install feast

# With PostgreSQL online store
pip install feast[postgres]

# With Redis online store
pip install feast[redis]

# Full dependencies
pip install feast[postgres,redis,aws,gcp]

Project Initialization

# Create project
feast init my_feature_store
cd my_feature_store

# Directory structure
# my_feature_store/
# ├── feature_repo/
# │   ├── feature_store.yaml    # Feast configuration
# │   ├── example_repo.py       # Feature definition examples
# │   └── data/                 # Sample data
# └── README.md

feature_store.yaml Configuration

project: my_feature_store
registry: data/registry.db
provider: local

online_store:
  type: sqlite
  path: data/online_store.db

offline_store:
  type: file

entity_key_serialization_version: 2

For production environments, modify as follows:

project: my_feature_store
registry:
  registry_type: sql
  path: postgresql://user:pass@host:5432/feast_registry

provider: local

online_store:
  type: redis
  connection_string: redis://localhost:6379

offline_store:
  type: file # or bigquery, redshift, snowflake

Feature Definitions

Data Source and Entity Definitions

# feature_repo/features.py
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource, PushSource
from feast.types import Float32, Int64, String

# Data source definition
user_transactions_source = FileSource(
    path="data/user_transactions.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp",
)

# Entity definition (the key that features are based on)
user = Entity(
    name="user_id",
    join_keys=["user_id"],
    description="Unique user ID",
)

Feature View Definition

# Offline + Online Feature View
user_transaction_features = FeatureView(
    name="user_transaction_features",
    entities=[user],
    ttl=timedelta(days=7),  # Expires after 7 days in online store
    schema=[
        Field(name="total_purchases", dtype=Int64, description="Total number of purchases"),
        Field(name="avg_purchase_amount", dtype=Float32, description="Average purchase amount"),
        Field(name="last_purchase_amount", dtype=Float32, description="Most recent purchase amount"),
        Field(name="purchase_frequency", dtype=Float32, description="Purchase frequency (transactions/day)"),
        Field(name="user_segment", dtype=String, description="User segment"),
    ],
    online=True,
    source=user_transactions_source,
    tags={"team": "ml-platform", "version": "v1"},
)

On-Demand Feature View (Real-time Transformation)

from feast import on_demand_feature_view, RequestSource

# Features computed dynamically at request time
input_request = RequestSource(
    name="purchase_request",
    schema=[
        Field(name="current_amount", dtype=Float32),
    ],
)

@on_demand_feature_view(
    sources=[user_transaction_features, input_request],
    schema=[
        Field(name="amount_vs_avg_ratio", dtype=Float32),
        Field(name="is_high_value", dtype=Int64),
    ],
)
def purchase_analysis(inputs: dict) -> dict:
    """Calculate the ratio of current purchase amount to average purchase amount"""
    import pandas as pd
    df = pd.DataFrame(inputs)
    df["amount_vs_avg_ratio"] = df["current_amount"] / (df["avg_purchase_amount"] + 1e-6)
    df["is_high_value"] = (df["amount_vs_avg_ratio"] > 2.0).astype(int)
    return df[["amount_vs_avg_ratio", "is_high_value"]]

Generating Sample Data

# scripts/generate_data.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

np.random.seed(42)
n_users = 1000
n_records = 5000

user_ids = [f"user_{i:04d}" for i in range(n_users)]
records = []

for _ in range(n_records):
    user_id = np.random.choice(user_ids)
    ts = datetime(2026, 1, 1) + timedelta(
        days=np.random.randint(0, 60),
        hours=np.random.randint(0, 24),
    )
    records.append({
        "user_id": user_id,
        "total_purchases": np.random.randint(1, 100),
        "avg_purchase_amount": round(np.random.uniform(10, 500), 2),
        "last_purchase_amount": round(np.random.uniform(5, 1000), 2),
        "purchase_frequency": round(np.random.uniform(0.1, 5.0), 3),
        "user_segment": np.random.choice(["bronze", "silver", "gold", "platinum"]),
        "event_timestamp": ts,
        "created_timestamp": ts,
    })

df = pd.DataFrame(records)
df.to_parquet("feature_repo/data/user_transactions.parquet", index=False)
print(f"Generated {len(df)} records for {n_users} users")

python scripts/generate_data.py
# Generated 5000 records for 1000 users

Feast Workflow

1. Apply — Register Feature Definitions

cd feature_repo
feast apply

Created entity user_id
Created feature view user_transaction_features
Created on demand feature view purchase_analysis

Deploying infrastructure for my_feature_store...

2. Materialize — Sync Offline to Online Store

# Load data for a specific time range into the online store
feast materialize 2026-01-01T00:00:00 2026-03-01T00:00:00

# Incremental load (from last materialize to now)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Materializing 1 feature views from 2026-01-01 to 2026-03-01
user_transaction_features:
100%|████████████████████████| 1000/1000 [00:03<00:00, 312.45it/s]

3. Offline Feature Retrieval (for Training)

from feast import FeatureStore
import pandas as pd

store = FeatureStore(repo_path="feature_repo")

# Entity DataFrame for generating training data
entity_df = pd.DataFrame({
    "user_id": ["user_0001", "user_0042", "user_0100", "user_0500"],
    "event_timestamp": pd.to_datetime([
        "2026-02-01", "2026-02-15", "2026-01-20", "2026-02-28"
    ]),
})

# Retrieve features with Point-in-Time Join
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "user_transaction_features:total_purchases",
        "user_transaction_features:avg_purchase_amount",
        "user_transaction_features:last_purchase_amount",
        "user_transaction_features:purchase_frequency",
        "user_transaction_features:user_segment",
    ],
).to_df()

print(training_df.head())

    user_id  event_timestamp  total_purchases  avg_purchase_amount  ...
0  user_0001  2026-02-01            45           234.56             ...
1  user_0042  2026-02-15            12            89.30             ...
2  user_0100  2026-01-20            78           456.78             ...
3  user_0500  2026-02-28            33           167.42             ...

Point-in-Time Join is the key here. It retrieves the most recent feature values as of each entity's event_timestamp. This ensures accurate training data without data leakage.

4. Online Feature Retrieval (for Serving)

# Retrieve features in real-time serving
online_features = store.get_online_features(
    features=[
        "user_transaction_features:total_purchases",
        "user_transaction_features:avg_purchase_amount",
        "user_transaction_features:user_segment",
        "purchase_analysis:amount_vs_avg_ratio",
        "purchase_analysis:is_high_value",
    ],
    entity_rows=[
        {"user_id": "user_0001", "current_amount": 750.0},
        {"user_id": "user_0042", "current_amount": 50.0},
    ],
).to_dict()

print(online_features)

{
    "user_id": ["user_0001", "user_0042"],
    "total_purchases": [45, 12],
    "avg_purchase_amount": [234.56, 89.30],
    "user_segment": ["gold", "silver"],
    "amount_vs_avg_ratio": [3.199, 0.560],
    "is_high_value": [1, 0],
}

Managing Feature Groups with Feature Service

from feast import FeatureService

# Bundle of features needed for the recommendation model
recommendation_service = FeatureService(
    name="recommendation_features",
    features=[
        user_transaction_features[["total_purchases", "avg_purchase_amount", "user_segment"]],
        purchase_analysis,
    ],
    tags={"model": "recommendation-v2"},
)

# Bundle of features needed for the fraud detection model
fraud_detection_service = FeatureService(
    name="fraud_detection_features",
    features=[
        user_transaction_features,
        purchase_analysis,
    ],
    tags={"model": "fraud-detection-v1"},
)

# Retrieve via Feature Service
features = store.get_online_features(
    features=store.get_feature_service("recommendation_features"),
    entity_rows=[{"user_id": "user_0001", "current_amount": 750.0}],
).to_dict()

Real-time Feature Updates with Push Source

from feast import PushSource

# Push source definition
user_realtime_source = PushSource(
    name="user_realtime_push",
    batch_source=user_transactions_source,
)

# Update features when real-time events occur
store.push(
    push_source_name="user_realtime_push",
    df=pd.DataFrame({
        "user_id": ["user_0001"],
        "total_purchases": [46],
        "avg_purchase_amount": [240.12],
        "last_purchase_amount": [750.0],
        "purchase_frequency": [2.1],
        "user_segment": ["gold"],
        "event_timestamp": [pd.Timestamp.now()],
        "created_timestamp": [pd.Timestamp.now()],
    }),
)

Feature Server Deployment

# Start local Feature Server
feast serve -h 0.0.0.0 -p 6566

# Retrieve features via HTTP API
curl -X POST http://localhost:6566/get-online-features \
  -H "Content-Type: application/json" \
  -d '{
    "features": [
      "user_transaction_features:total_purchases",
      "user_transaction_features:avg_purchase_amount"
    ],
    "entities": {
      "user_id": ["user_0001", "user_0042"]
    }
  }'

Deploying Feature Server with Docker

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install feast[redis]

COPY feature_repo/ feature_repo/
WORKDIR /app/feature_repo

# Apply registry & start server
CMD feast apply && feast serve -h 0.0.0.0 -p 6566

# docker-compose.yml
services:
  feast-server:
    build: .
    ports:
      - '6566:6566'
    depends_on:
      - redis
    environment:
      - REDIS_URL=redis://redis:6379

  redis:
    image: redis:7-alpine
    ports:
      - '6379:6379'

Integration with Airflow (Automated Materialize)

# dags/feast_materialize.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {
    "owner": "ml-platform",
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
}

with DAG(
    dag_id="feast_materialize",
    default_args=default_args,
    schedule_interval="0 */6 * * *",  # Every 6 hours
    start_date=datetime(2026, 1, 1),
    catchup=False,
) as dag:

    materialize = BashOperator(
        task_id="materialize_incremental",
        bash_command=(
            "cd /opt/feature_repo && "
            "feast materialize-incremental $(date -u +'%Y-%m-%dT%H:%M:%S')"
        ),
    )

Conclusion

Here are the key takeaways for building a feature pipeline with Feast:

Consistent Feature Definitions: Using the same feature definitions for training and serving prevents Training-Serving Skew
Point-in-Time Join: Accurate feature joins based on timestamps prevent data leakage
Offline/Online Dual Stores: Offline store for batch training, online store for real-time serving
Feature Service: Managing feature groups per model improves reusability
Push Source: Supports real-time event-based feature updates

A Feature Store may feel like overkill when you have just one or two ML models, but it becomes essential infrastructure as the number of models grows and the team scales. Its value is maximized especially when multiple models share the same features.

Quiz

Q1: What is Training-Serving Skew?

It is a phenomenon where model performance degrades because the features used during training differ from those used during serving. Common causes include inconsistencies in feature computation logic, differences in data sources, and misaligned time references.

Q2: What role does Point-in-Time Join play?

It joins the most recent feature values prior to each entity's event timestamp (event_timestamp). This prevents data leakage where future data would be used during training.

Q3: What is the difference between the offline store and online store in Feast?

The offline store holds large volumes of historical features for batch training (files, BigQuery, etc.), while the online store holds only the latest feature values for low-latency real-time serving (Redis, DynamoDB, etc.).

Q4: What does the feast materialize command do?

It synchronizes (loads) feature data from the offline store into the online store. It stores the latest feature values for the specified time range in the online store, enabling real-time retrieval.

Q5: What is the difference between On-Demand Feature View and a regular Feature View?

A regular Feature View stores pre-computed features, while an On-Demand Feature View dynamically computes features at request time. It is used for real-time transformations that combine request parameters with existing features.

Q6: What are the benefits of Feature Service?

It allows logical grouping of features needed by each model. It makes it clear which model uses which features and provides a consistent interface for feature retrieval.

Q7: What does the TTL (Time To Live) setting mean?

It specifies the validity period for feature values in the online store. Features past their TTL are returned as null during retrieval, preventing stale feature values from being used in serving.

Q8: When would you use Push Source?

It is used when the online store features need to be updated immediately upon real-time events (payments, clicks, etc.). It keeps features up to date between periodic batch materializations.