- Authors
- Name
- Overview
- Why You Need a Feature Store
- Installing Feast and Initializing a Project
- Feature Definitions
- Generating Sample Data
- Feast Workflow
- Managing Feature Groups with Feature Service
- Real-time Feature Updates with Push Source
- Feature Server Deployment
- Integration with Airflow (Automated Materialize)
- Conclusion
- Quiz

Overview
One of the most common problems when deploying ML models to production is Training-Serving Skew — a phenomenon where model performance degrades because the features used during training differ from those used during serving. A Feature Store is the infrastructure component that fundamentally solves this problem by centrally managing feature definitions, storage, and serving.
Feast (Feature Store) is the most widely used open-source feature store, supporting both offline (batch training) and online (real-time serving) paths. This article covers the entire process of building a feature pipeline with Feast.
Why You Need a Feature Store
The Training-Serving Skew Problem
# During training (offline)
features = pd.read_sql("""
SELECT user_id,
AVG(purchase_amount) as avg_purchase,
COUNT(*) as purchase_count
FROM transactions
WHERE timestamp < '2026-01-01'
GROUP BY user_id
""", conn)
# During serving (online) - Skew occurs when using different logic!
features = redis_client.get(f"user:{user_id}:features")
When training and serving compute the same features with different code, subtle discrepancies emerge, and model performance diverges from offline experiments. A Feature Store provides consistent values from a single feature definition for both offline and online use.
Core Capabilities of a Feature Store
| Capability | Description |
|---|---|
| Feature Registry | Manages feature metadata, schemas, and ownership |
| Offline Store | Bulk feature retrieval for batch training (Point-in-Time Join) |
| Online Store | Low-latency feature retrieval for real-time serving |
| Feature Service | Serves features via gRPC/HTTP API |
| Point-in-Time Join | Joins exact feature values based on timestamps |
Installing Feast and Initializing a Project
Installation
# Basic installation
pip install feast
# With PostgreSQL online store
pip install feast[postgres]
# With Redis online store
pip install feast[redis]
# Full dependencies
pip install feast[postgres,redis,aws,gcp]
Project Initialization
# Create project
feast init my_feature_store
cd my_feature_store
# Directory structure
# my_feature_store/
# ├── feature_repo/
# │ ├── feature_store.yaml # Feast configuration
# │ ├── example_repo.py # Feature definition examples
# │ └── data/ # Sample data
# └── README.md
feature_store.yaml Configuration
project: my_feature_store
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.db
offline_store:
type: file
entity_key_serialization_version: 2
For production environments, modify as follows:
project: my_feature_store
registry:
registry_type: sql
path: postgresql://user:pass@host:5432/feast_registry
provider: local
online_store:
type: redis
connection_string: redis://localhost:6379
offline_store:
type: file # or bigquery, redshift, snowflake
Feature Definitions
Data Source and Entity Definitions
# feature_repo/features.py
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource, PushSource
from feast.types import Float32, Int64, String
# Data source definition
user_transactions_source = FileSource(
path="data/user_transactions.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp",
)
# Entity definition (the key that features are based on)
user = Entity(
name="user_id",
join_keys=["user_id"],
description="Unique user ID",
)
Feature View Definition
# Offline + Online Feature View
user_transaction_features = FeatureView(
name="user_transaction_features",
entities=[user],
ttl=timedelta(days=7), # Expires after 7 days in online store
schema=[
Field(name="total_purchases", dtype=Int64, description="Total number of purchases"),
Field(name="avg_purchase_amount", dtype=Float32, description="Average purchase amount"),
Field(name="last_purchase_amount", dtype=Float32, description="Most recent purchase amount"),
Field(name="purchase_frequency", dtype=Float32, description="Purchase frequency (transactions/day)"),
Field(name="user_segment", dtype=String, description="User segment"),
],
online=True,
source=user_transactions_source,
tags={"team": "ml-platform", "version": "v1"},
)
On-Demand Feature View (Real-time Transformation)
from feast import on_demand_feature_view, RequestSource
# Features computed dynamically at request time
input_request = RequestSource(
name="purchase_request",
schema=[
Field(name="current_amount", dtype=Float32),
],
)
@on_demand_feature_view(
sources=[user_transaction_features, input_request],
schema=[
Field(name="amount_vs_avg_ratio", dtype=Float32),
Field(name="is_high_value", dtype=Int64),
],
)
def purchase_analysis(inputs: dict) -> dict:
"""Calculate the ratio of current purchase amount to average purchase amount"""
import pandas as pd
df = pd.DataFrame(inputs)
df["amount_vs_avg_ratio"] = df["current_amount"] / (df["avg_purchase_amount"] + 1e-6)
df["is_high_value"] = (df["amount_vs_avg_ratio"] > 2.0).astype(int)
return df[["amount_vs_avg_ratio", "is_high_value"]]
Generating Sample Data
# scripts/generate_data.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
np.random.seed(42)
n_users = 1000
n_records = 5000
user_ids = [f"user_{i:04d}" for i in range(n_users)]
records = []
for _ in range(n_records):
user_id = np.random.choice(user_ids)
ts = datetime(2026, 1, 1) + timedelta(
days=np.random.randint(0, 60),
hours=np.random.randint(0, 24),
)
records.append({
"user_id": user_id,
"total_purchases": np.random.randint(1, 100),
"avg_purchase_amount": round(np.random.uniform(10, 500), 2),
"last_purchase_amount": round(np.random.uniform(5, 1000), 2),
"purchase_frequency": round(np.random.uniform(0.1, 5.0), 3),
"user_segment": np.random.choice(["bronze", "silver", "gold", "platinum"]),
"event_timestamp": ts,
"created_timestamp": ts,
})
df = pd.DataFrame(records)
df.to_parquet("feature_repo/data/user_transactions.parquet", index=False)
print(f"Generated {len(df)} records for {n_users} users")
python scripts/generate_data.py
# Generated 5000 records for 1000 users
Feast Workflow
1. Apply — Register Feature Definitions
cd feature_repo
feast apply
Created entity user_id
Created feature view user_transaction_features
Created on demand feature view purchase_analysis
Deploying infrastructure for my_feature_store...
2. Materialize — Sync Offline to Online Store
# Load data for a specific time range into the online store
feast materialize 2026-01-01T00:00:00 2026-03-01T00:00:00
# Incremental load (from last materialize to now)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
Materializing 1 feature views from 2026-01-01 to 2026-03-01
user_transaction_features:
100%|████████████████████████| 1000/1000 [00:03<00:00, 312.45it/s]
3. Offline Feature Retrieval (for Training)
from feast import FeatureStore
import pandas as pd
store = FeatureStore(repo_path="feature_repo")
# Entity DataFrame for generating training data
entity_df = pd.DataFrame({
"user_id": ["user_0001", "user_0042", "user_0100", "user_0500"],
"event_timestamp": pd.to_datetime([
"2026-02-01", "2026-02-15", "2026-01-20", "2026-02-28"
]),
})
# Retrieve features with Point-in-Time Join
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"user_transaction_features:total_purchases",
"user_transaction_features:avg_purchase_amount",
"user_transaction_features:last_purchase_amount",
"user_transaction_features:purchase_frequency",
"user_transaction_features:user_segment",
],
).to_df()
print(training_df.head())
user_id event_timestamp total_purchases avg_purchase_amount ...
0 user_0001 2026-02-01 45 234.56 ...
1 user_0042 2026-02-15 12 89.30 ...
2 user_0100 2026-01-20 78 456.78 ...
3 user_0500 2026-02-28 33 167.42 ...
Point-in-Time Join is the key here. It retrieves the most recent feature values as of each entity's event_timestamp. This ensures accurate training data without data leakage.
4. Online Feature Retrieval (for Serving)
# Retrieve features in real-time serving
online_features = store.get_online_features(
features=[
"user_transaction_features:total_purchases",
"user_transaction_features:avg_purchase_amount",
"user_transaction_features:user_segment",
"purchase_analysis:amount_vs_avg_ratio",
"purchase_analysis:is_high_value",
],
entity_rows=[
{"user_id": "user_0001", "current_amount": 750.0},
{"user_id": "user_0042", "current_amount": 50.0},
],
).to_dict()
print(online_features)
{
"user_id": ["user_0001", "user_0042"],
"total_purchases": [45, 12],
"avg_purchase_amount": [234.56, 89.30],
"user_segment": ["gold", "silver"],
"amount_vs_avg_ratio": [3.199, 0.560],
"is_high_value": [1, 0],
}
Managing Feature Groups with Feature Service
from feast import FeatureService
# Bundle of features needed for the recommendation model
recommendation_service = FeatureService(
name="recommendation_features",
features=[
user_transaction_features[["total_purchases", "avg_purchase_amount", "user_segment"]],
purchase_analysis,
],
tags={"model": "recommendation-v2"},
)
# Bundle of features needed for the fraud detection model
fraud_detection_service = FeatureService(
name="fraud_detection_features",
features=[
user_transaction_features,
purchase_analysis,
],
tags={"model": "fraud-detection-v1"},
)
# Retrieve via Feature Service
features = store.get_online_features(
features=store.get_feature_service("recommendation_features"),
entity_rows=[{"user_id": "user_0001", "current_amount": 750.0}],
).to_dict()
Real-time Feature Updates with Push Source
from feast import PushSource
# Push source definition
user_realtime_source = PushSource(
name="user_realtime_push",
batch_source=user_transactions_source,
)
# Update features when real-time events occur
store.push(
push_source_name="user_realtime_push",
df=pd.DataFrame({
"user_id": ["user_0001"],
"total_purchases": [46],
"avg_purchase_amount": [240.12],
"last_purchase_amount": [750.0],
"purchase_frequency": [2.1],
"user_segment": ["gold"],
"event_timestamp": [pd.Timestamp.now()],
"created_timestamp": [pd.Timestamp.now()],
}),
)
Feature Server Deployment
# Start local Feature Server
feast serve -h 0.0.0.0 -p 6566
# Retrieve features via HTTP API
curl -X POST http://localhost:6566/get-online-features \
-H "Content-Type: application/json" \
-d '{
"features": [
"user_transaction_features:total_purchases",
"user_transaction_features:avg_purchase_amount"
],
"entities": {
"user_id": ["user_0001", "user_0042"]
}
}'
Deploying Feature Server with Docker
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install feast[redis]
COPY feature_repo/ feature_repo/
WORKDIR /app/feature_repo
# Apply registry & start server
CMD feast apply && feast serve -h 0.0.0.0 -p 6566
# docker-compose.yml
services:
feast-server:
build: .
ports:
- '6566:6566'
depends_on:
- redis
environment:
- REDIS_URL=redis://redis:6379
redis:
image: redis:7-alpine
ports:
- '6379:6379'
Integration with Airflow (Automated Materialize)
# dags/feast_materialize.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
"owner": "ml-platform",
"retries": 2,
"retry_delay": timedelta(minutes=5),
}
with DAG(
dag_id="feast_materialize",
default_args=default_args,
schedule_interval="0 */6 * * *", # Every 6 hours
start_date=datetime(2026, 1, 1),
catchup=False,
) as dag:
materialize = BashOperator(
task_id="materialize_incremental",
bash_command=(
"cd /opt/feature_repo && "
"feast materialize-incremental $(date -u +'%Y-%m-%dT%H:%M:%S')"
),
)
Conclusion
Here are the key takeaways for building a feature pipeline with Feast:
- Consistent Feature Definitions: Using the same feature definitions for training and serving prevents Training-Serving Skew
- Point-in-Time Join: Accurate feature joins based on timestamps prevent data leakage
- Offline/Online Dual Stores: Offline store for batch training, online store for real-time serving
- Feature Service: Managing feature groups per model improves reusability
- Push Source: Supports real-time event-based feature updates
A Feature Store may feel like overkill when you have just one or two ML models, but it becomes essential infrastructure as the number of models grows and the team scales. Its value is maximized especially when multiple models share the same features.
Quiz
Q1: What is Training-Serving Skew?
It is a phenomenon where model performance degrades because the features used during training differ from those used during serving. Common causes include inconsistencies in feature computation logic, differences in data sources, and misaligned time references.
Q2: What role does Point-in-Time Join play?
It joins the most recent feature values prior to each entity's event timestamp (event_timestamp). This prevents data leakage where future data would be used during training.
Q3: What is the difference between the offline store and online store in Feast?
The offline store holds large volumes of historical features for batch training (files, BigQuery, etc.), while the online store holds only the latest feature values for low-latency real-time serving (Redis, DynamoDB, etc.).
Q4: What does the
It synchronizes (loads) feature data from the offline store into the online store. It stores the latest feature values for the specified time range in the online store, enabling real-time retrieval.feast materialize command do?
Q5: What is the difference between On-Demand Feature View and a regular Feature View?
A regular Feature View stores pre-computed features, while an On-Demand Feature View dynamically computes features at request time. It is used for real-time transformations that combine request parameters with existing features.
Q6: What are the benefits of Feature Service?
It allows logical grouping of features needed by each model. It makes it clear which model uses which features and provides a consistent interface for feature retrieval.
Q7: What does the TTL (Time To Live) setting mean?
It specifies the validity period for feature values in the online store. Features past their TTL are returned as null during retrieval, preventing stale feature values from being used in serving.
Q8: When would you use Push Source?
It is used when the online store features need to be updated immediately upon real-time events (payments, clicks, etc.). It keeps features up to date between periodic batch materializations.