Split View: MLOps Feature Store 실전 — Feast로 피처 파이프라인 구축하기

MLOps Feature Store 실전 — Feast로 피처 파이프라인 구축하기

개요
Feature Store가 필요한 이유
- Training-Serving Skew 문제
- Feature Store의 핵심 기능
Feast 설치 및 프로젝트 초기화
피처 정의
샘플 데이터 생성
Feast 워크플로우
Feature Service로 피처 그룹 관리
Push Source로 실시간 피처 업데이트
Feature Server 배포
- Docker로 Feature Server 배포
Airflow와 연동 (자동 Materialize)
마무리
퀴즈

개요

ML 모델을 프로덕션에 배포할 때 가장 흔한 문제 중 하나가 Training-Serving Skew다. 학습 시 사용한 피처와 서빙 시 사용하는 피처가 달라져서 모델 성능이 떨어지는 현상이다. Feature Store는 이 문제를 근본적으로 해결하는 인프라 컴포넌트로, 피처의 정의·저장·서빙을 중앙에서 관리한다.

Feast(Feature Store)는 가장 널리 사용되는 오픈소스 피처 스토어로, 오프라인(배치 학습)과 온라인(실시간 서빙) 두 가지 경로를 모두 지원한다. 이 글에서는 Feast를 활용해 피처 파이프라인을 구축하는 전 과정을 다룬다.

Feature Store가 필요한 이유

Training-Serving Skew 문제

# 학습 시 (오프라인)
features = pd.read_sql("""
    SELECT user_id,
           AVG(purchase_amount) as avg_purchase,
           COUNT(*) as purchase_count
    FROM transactions
    WHERE timestamp < '2026-01-01'
    GROUP BY user_id
""", conn)

# 서빙 시 (온라인) - 다른 로직으로 계산하면 Skew 발생!
features = redis_client.get(f"user:{user_id}:features")

학습과 서빙에서 같은 피처를 다른 코드로 계산하면 미묘한 차이가 생기고, 모델 성능이 오프라인 실험과 달라진다. Feature Store는 하나의 피처 정의에서 오프라인/온라인 모두 일관된 값을 제공한다.

Feature Store의 핵심 기능

기능	설명
피처 레지스트리	피처의 메타데이터, 스키마, 소유자 관리
오프라인 스토어	배치 학습용 대량 피처 조회 (Point-in-Time Join)
온라인 스토어	실시간 서빙용 저지연 피처 조회
피처 서비스	gRPC/HTTP API로 피처 서빙
Point-in-Time Join	시간 기준으로 정확한 피처 값 조인

Feast 설치 및 프로젝트 초기화

설치

# 기본 설치
pip install feast

# PostgreSQL 온라인 스토어 사용 시
pip install feast[postgres]

# Redis 온라인 스토어 사용 시
pip install feast[redis]

# 전체 의존성
pip install feast[postgres,redis,aws,gcp]

프로젝트 초기화

# 프로젝트 생성
feast init my_feature_store
cd my_feature_store

# 디렉토리 구조
# my_feature_store/
# ├── feature_repo/
# │   ├── feature_store.yaml    # Feast 설정
# │   ├── example_repo.py       # 피처 정의 예제
# │   └── data/                 # 샘플 데이터
# └── README.md

feature_store.yaml 설정

project: my_feature_store
registry: data/registry.db
provider: local

online_store:
  type: sqlite
  path: data/online_store.db

offline_store:
  type: file

entity_key_serialization_version: 2

프로덕션 환경에서는 다음과 같이 변경한다:

project: my_feature_store
registry:
  registry_type: sql
  path: postgresql://user:pass@host:5432/feast_registry

provider: local

online_store:
  type: redis
  connection_string: redis://localhost:6379

offline_store:
  type: file # 또는 bigquery, redshift, snowflake

피처 정의

데이터 소스 및 엔티티 정의

# feature_repo/features.py
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource, PushSource
from feast.types import Float32, Int64, String

# 데이터 소스 정의
user_transactions_source = FileSource(
    path="data/user_transactions.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp",
)

# 엔티티 정의 (피처의 기준이 되는 키)
user = Entity(
    name="user_id",
    join_keys=["user_id"],
    description="사용자 고유 ID",
)

Feature View 정의

# 오프라인 + 온라인 피처 뷰
user_transaction_features = FeatureView(
    name="user_transaction_features",
    entities=[user],
    ttl=timedelta(days=7),  # 온라인 스토어에서 7일 후 만료
    schema=[
        Field(name="total_purchases", dtype=Int64, description="총 구매 횟수"),
        Field(name="avg_purchase_amount", dtype=Float32, description="평균 구매 금액"),
        Field(name="last_purchase_amount", dtype=Float32, description="최근 구매 금액"),
        Field(name="purchase_frequency", dtype=Float32, description="구매 빈도 (건/일)"),
        Field(name="user_segment", dtype=String, description="사용자 세그먼트"),
    ],
    online=True,
    source=user_transactions_source,
    tags={"team": "ml-platform", "version": "v1"},
)

On-Demand Feature View (실시간 변환)

from feast import on_demand_feature_view, RequestSource

# 요청 시점에 동적으로 계산되는 피처
input_request = RequestSource(
    name="purchase_request",
    schema=[
        Field(name="current_amount", dtype=Float32),
    ],
)

@on_demand_feature_view(
    sources=[user_transaction_features, input_request],
    schema=[
        Field(name="amount_vs_avg_ratio", dtype=Float32),
        Field(name="is_high_value", dtype=Int64),
    ],
)
def purchase_analysis(inputs: dict) -> dict:
    """현재 구매 금액과 평균 구매 금액의 비율 계산"""
    import pandas as pd
    df = pd.DataFrame(inputs)
    df["amount_vs_avg_ratio"] = df["current_amount"] / (df["avg_purchase_amount"] + 1e-6)
    df["is_high_value"] = (df["amount_vs_avg_ratio"] > 2.0).astype(int)
    return df[["amount_vs_avg_ratio", "is_high_value"]]

샘플 데이터 생성

# scripts/generate_data.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

np.random.seed(42)
n_users = 1000
n_records = 5000

user_ids = [f"user_{i:04d}" for i in range(n_users)]
records = []

for _ in range(n_records):
    user_id = np.random.choice(user_ids)
    ts = datetime(2026, 1, 1) + timedelta(
        days=np.random.randint(0, 60),
        hours=np.random.randint(0, 24),
    )
    records.append({
        "user_id": user_id,
        "total_purchases": np.random.randint(1, 100),
        "avg_purchase_amount": round(np.random.uniform(10, 500), 2),
        "last_purchase_amount": round(np.random.uniform(5, 1000), 2),
        "purchase_frequency": round(np.random.uniform(0.1, 5.0), 3),
        "user_segment": np.random.choice(["bronze", "silver", "gold", "platinum"]),
        "event_timestamp": ts,
        "created_timestamp": ts,
    })

df = pd.DataFrame(records)
df.to_parquet("feature_repo/data/user_transactions.parquet", index=False)
print(f"Generated {len(df)} records for {n_users} users")

python scripts/generate_data.py
# Generated 5000 records for 1000 users

Feast 워크플로우

1. Apply — 피처 정의 등록

cd feature_repo
feast apply

Created entity user_id
Created feature view user_transaction_features
Created on demand feature view purchase_analysis

Deploying infrastructure for my_feature_store...

2. Materialize — 오프라인 → 온라인 스토어 동기화

# 특정 기간의 데이터를 온라인 스토어로 적재
feast materialize 2026-01-01T00:00:00 2026-03-01T00:00:00

# 증분 적재 (마지막 materialize 이후 ~ 현재)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Materializing 1 feature views from 2026-01-01 to 2026-03-01
user_transaction_features:
100%|████████████████████████| 1000/1000 [00:03<00:00, 312.45it/s]

3. 오프라인 피처 조회 (학습용)

from feast import FeatureStore
import pandas as pd

store = FeatureStore(repo_path="feature_repo")

# 학습 데이터 생성을 위한 엔티티 데이터프레임
entity_df = pd.DataFrame({
    "user_id": ["user_0001", "user_0042", "user_0100", "user_0500"],
    "event_timestamp": pd.to_datetime([
        "2026-02-01", "2026-02-15", "2026-01-20", "2026-02-28"
    ]),
})

# Point-in-Time Join으로 피처 조회
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "user_transaction_features:total_purchases",
        "user_transaction_features:avg_purchase_amount",
        "user_transaction_features:last_purchase_amount",
        "user_transaction_features:purchase_frequency",
        "user_transaction_features:user_segment",
    ],
).to_df()

print(training_df.head())

    user_id  event_timestamp  total_purchases  avg_purchase_amount  ...
0  user_0001  2026-02-01            45           234.56             ...
1  user_0042  2026-02-15            12            89.30             ...
2  user_0100  2026-01-20            78           456.78             ...
3  user_0500  2026-02-28            33           167.42             ...

Point-in-Time Join이 핵심이다. 각 엔티티의 event_timestamp 시점에서 가장 최신의 피처 값을 가져온다. 이를 통해 데이터 누수(data leakage) 없이 정확한 학습 데이터를 구성할 수 있다.

4. 온라인 피처 조회 (서빙용)

# 실시간 서빙에서 피처 조회
online_features = store.get_online_features(
    features=[
        "user_transaction_features:total_purchases",
        "user_transaction_features:avg_purchase_amount",
        "user_transaction_features:user_segment",
        "purchase_analysis:amount_vs_avg_ratio",
        "purchase_analysis:is_high_value",
    ],
    entity_rows=[
        {"user_id": "user_0001", "current_amount": 750.0},
        {"user_id": "user_0042", "current_amount": 50.0},
    ],
).to_dict()

print(online_features)

{
    "user_id": ["user_0001", "user_0042"],
    "total_purchases": [45, 12],
    "avg_purchase_amount": [234.56, 89.30],
    "user_segment": ["gold", "silver"],
    "amount_vs_avg_ratio": [3.199, 0.560],
    "is_high_value": [1, 0],
}

Feature Service로 피처 그룹 관리

from feast import FeatureService

# 추천 모델에 필요한 피처 묶음
recommendation_service = FeatureService(
    name="recommendation_features",
    features=[
        user_transaction_features[["total_purchases", "avg_purchase_amount", "user_segment"]],
        purchase_analysis,
    ],
    tags={"model": "recommendation-v2"},
)

# 사기 탐지 모델에 필요한 피처 묶음
fraud_detection_service = FeatureService(
    name="fraud_detection_features",
    features=[
        user_transaction_features,
        purchase_analysis,
    ],
    tags={"model": "fraud-detection-v1"},
)

# Feature Service로 조회
features = store.get_online_features(
    features=store.get_feature_service("recommendation_features"),
    entity_rows=[{"user_id": "user_0001", "current_amount": 750.0}],
).to_dict()

Push Source로 실시간 피처 업데이트

from feast import PushSource

# Push 소스 정의
user_realtime_source = PushSource(
    name="user_realtime_push",
    batch_source=user_transactions_source,
)

# 실시간 이벤트 발생 시 피처 업데이트
store.push(
    push_source_name="user_realtime_push",
    df=pd.DataFrame({
        "user_id": ["user_0001"],
        "total_purchases": [46],
        "avg_purchase_amount": [240.12],
        "last_purchase_amount": [750.0],
        "purchase_frequency": [2.1],
        "user_segment": ["gold"],
        "event_timestamp": [pd.Timestamp.now()],
        "created_timestamp": [pd.Timestamp.now()],
    }),
)

Feature Server 배포

# 로컬 Feature Server 실행
feast serve -h 0.0.0.0 -p 6566

# HTTP API로 피처 조회
curl -X POST http://localhost:6566/get-online-features \
  -H "Content-Type: application/json" \
  -d '{
    "features": [
      "user_transaction_features:total_purchases",
      "user_transaction_features:avg_purchase_amount"
    ],
    "entities": {
      "user_id": ["user_0001", "user_0042"]
    }
  }'

Docker로 Feature Server 배포

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install feast[redis]

COPY feature_repo/ feature_repo/
WORKDIR /app/feature_repo

# Registry 적용 & 서버 실행
CMD feast apply && feast serve -h 0.0.0.0 -p 6566

# docker-compose.yml
services:
  feast-server:
    build: .
    ports:
      - '6566:6566'
    depends_on:
      - redis
    environment:
      - REDIS_URL=redis://redis:6379

  redis:
    image: redis:7-alpine
    ports:
      - '6379:6379'

Airflow와 연동 (자동 Materialize)

# dags/feast_materialize.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {
    "owner": "ml-platform",
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
}

with DAG(
    dag_id="feast_materialize",
    default_args=default_args,
    schedule_interval="0 */6 * * *",  # 6시간마다
    start_date=datetime(2026, 1, 1),
    catchup=False,
) as dag:

    materialize = BashOperator(
        task_id="materialize_incremental",
        bash_command=(
            "cd /opt/feature_repo && "
            "feast materialize-incremental $(date -u +'%Y-%m-%dT%H:%M:%S')"
        ),
    )

마무리

Feast를 활용한 피처 파이프라인의 핵심 포인트를 정리하면:

일관된 피처 정의: 학습과 서빙에서 동일한 피처 정의 사용으로 Training-Serving Skew 방지
Point-in-Time Join: 시간 기준 정확한 피처 조인으로 데이터 누수 방지
오프라인/온라인 이중화: 배치 학습은 오프라인 스토어, 실시간 서빙은 온라인 스토어
Feature Service: 모델별 피처 그룹 관리로 재사용성 향상
Push Source: 실시간 이벤트 기반 피처 업데이트 지원

Feature Store는 ML 모델이 1~2개일 때는 과하게 느낄 수 있지만, 모델이 늘어나고 팀이 커지면 필수적인 인프라가 된다. 특히 여러 모델이 같은 피처를 공유할 때 그 가치가 극대화된다.

퀴즈

Q1: Training-Serving Skew란 무엇인가?

학습 시 사용한 피처와 서빙 시 사용하는 피처가 달라서 모델 성능이 저하되는 현상이다. 피처 계산 로직의 불일치, 데이터 소스의 차이, 시간 기준의 불일치 등이 원인이다.

Q2: Point-in-Time Join의 역할은?

각 엔티티의 이벤트 시점(event_timestamp)을 기준으로 그 시점 이전의 가장 최신 피처 값을 조인한다. 이를 통해 미래 데이터가 학습에 사용되는 데이터 누수(data leakage)를 방지한다.

Q3: Feast에서 오프라인 스토어와 온라인 스토어의 차이는?

오프라인 스토어는 대량의 히스토리컬 피처를 저장하여 배치 학습에 사용되고(파일, BigQuery 등), 온라인 스토어는 최신 피처 값만 저장하여 저지연 실시간 서빙에 사용된다(Redis, DynamoDB 등).

Q4: feast materialize 명령의 역할은?

오프라인 스토어의 피처 데이터를 온라인 스토어로 동기화(적재)하는 작업이다. 지정된 시간 범위의 최신 피처 값을 온라인 스토어에 저장하여 실시간 조회가 가능하게 한다.

Q5: On-Demand Feature View와 일반 Feature View의 차이는?

일반 Feature View는 사전에 계산된 피처를 저장하지만, On-Demand Feature View는 요청 시점에 동적으로 피처를 계산한다. 요청 파라미터와 기존 피처를 조합한 실시간 변환에 사용된다.

Q6: Feature Service의 장점은?

모델별로 필요한 피처를 논리적으로 그룹화하여 관리할 수 있다. 어떤 모델이 어떤 피처를 사용하는지 명확히 추적 가능하고, 피처 조회 시 일관된 인터페이스를 제공한다.

Q7: TTL(Time To Live) 설정의 의미는?

온라인 스토어에서 피처 값의 유효 기간을 지정한다. TTL이 지난 피처는 조회 시 null로 반환되어, 오래된(stale) 피처 값이 서빙에 사용되는 것을 방지한다.

Q8: Push Source를 사용하는 시나리오는?

실시간 이벤트(결제, 클릭 등)가 발생할 때 즉시 온라인 스토어의 피처를 업데이트해야 하는 경우에 사용한다. 배치 materialize의 주기적 갱신 사이에 최신 상태를 유지할 수 있다.

MLOps Feature Store in Practice — Building a Feature Pipeline with Feast

Overview
Why You Need a Feature Store
- The Training-Serving Skew Problem
- Core Capabilities of a Feature Store
Installing Feast and Initializing a Project
Feature Definitions
Generating Sample Data
Feast Workflow
Managing Feature Groups with Feature Service
Real-time Feature Updates with Push Source
Feature Server Deployment
- Deploying Feature Server with Docker
Integration with Airflow (Automated Materialize)
Conclusion
Quiz

Overview

One of the most common problems when deploying ML models to production is Training-Serving Skew — a phenomenon where model performance degrades because the features used during training differ from those used during serving. A Feature Store is the infrastructure component that fundamentally solves this problem by centrally managing feature definitions, storage, and serving.

Feast (Feature Store) is the most widely used open-source feature store, supporting both offline (batch training) and online (real-time serving) paths. This article covers the entire process of building a feature pipeline with Feast.

Why You Need a Feature Store

The Training-Serving Skew Problem

# During training (offline)
features = pd.read_sql("""
    SELECT user_id,
           AVG(purchase_amount) as avg_purchase,
           COUNT(*) as purchase_count
    FROM transactions
    WHERE timestamp < '2026-01-01'
    GROUP BY user_id
""", conn)

# During serving (online) - Skew occurs when using different logic!
features = redis_client.get(f"user:{user_id}:features")

When training and serving compute the same features with different code, subtle discrepancies emerge, and model performance diverges from offline experiments. A Feature Store provides consistent values from a single feature definition for both offline and online use.

Core Capabilities of a Feature Store

Capability	Description
Feature Registry	Manages feature metadata, schemas, and ownership
Offline Store	Bulk feature retrieval for batch training (Point-in-Time Join)
Online Store	Low-latency feature retrieval for real-time serving
Feature Service	Serves features via gRPC/HTTP API
Point-in-Time Join	Joins exact feature values based on timestamps

Installing Feast and Initializing a Project

Installation

# Basic installation
pip install feast

# With PostgreSQL online store
pip install feast[postgres]

# With Redis online store
pip install feast[redis]

# Full dependencies
pip install feast[postgres,redis,aws,gcp]

Project Initialization

# Create project
feast init my_feature_store
cd my_feature_store

# Directory structure
# my_feature_store/
# ├── feature_repo/
# │   ├── feature_store.yaml    # Feast configuration
# │   ├── example_repo.py       # Feature definition examples
# │   └── data/                 # Sample data
# └── README.md

feature_store.yaml Configuration

project: my_feature_store
registry: data/registry.db
provider: local

online_store:
  type: sqlite
  path: data/online_store.db

offline_store:
  type: file

entity_key_serialization_version: 2

For production environments, modify as follows:

project: my_feature_store
registry:
  registry_type: sql
  path: postgresql://user:pass@host:5432/feast_registry

provider: local

online_store:
  type: redis
  connection_string: redis://localhost:6379

offline_store:
  type: file # or bigquery, redshift, snowflake

Feature Definitions

Data Source and Entity Definitions

# feature_repo/features.py
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource, PushSource
from feast.types import Float32, Int64, String

# Data source definition
user_transactions_source = FileSource(
    path="data/user_transactions.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp",
)

# Entity definition (the key that features are based on)
user = Entity(
    name="user_id",
    join_keys=["user_id"],
    description="Unique user ID",
)

Feature View Definition

# Offline + Online Feature View
user_transaction_features = FeatureView(
    name="user_transaction_features",
    entities=[user],
    ttl=timedelta(days=7),  # Expires after 7 days in online store
    schema=[
        Field(name="total_purchases", dtype=Int64, description="Total number of purchases"),
        Field(name="avg_purchase_amount", dtype=Float32, description="Average purchase amount"),
        Field(name="last_purchase_amount", dtype=Float32, description="Most recent purchase amount"),
        Field(name="purchase_frequency", dtype=Float32, description="Purchase frequency (transactions/day)"),
        Field(name="user_segment", dtype=String, description="User segment"),
    ],
    online=True,
    source=user_transactions_source,
    tags={"team": "ml-platform", "version": "v1"},
)

On-Demand Feature View (Real-time Transformation)

from feast import on_demand_feature_view, RequestSource

# Features computed dynamically at request time
input_request = RequestSource(
    name="purchase_request",
    schema=[
        Field(name="current_amount", dtype=Float32),
    ],
)

@on_demand_feature_view(
    sources=[user_transaction_features, input_request],
    schema=[
        Field(name="amount_vs_avg_ratio", dtype=Float32),
        Field(name="is_high_value", dtype=Int64),
    ],
)
def purchase_analysis(inputs: dict) -> dict:
    """Calculate the ratio of current purchase amount to average purchase amount"""
    import pandas as pd
    df = pd.DataFrame(inputs)
    df["amount_vs_avg_ratio"] = df["current_amount"] / (df["avg_purchase_amount"] + 1e-6)
    df["is_high_value"] = (df["amount_vs_avg_ratio"] > 2.0).astype(int)
    return df[["amount_vs_avg_ratio", "is_high_value"]]

Generating Sample Data

# scripts/generate_data.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

np.random.seed(42)
n_users = 1000
n_records = 5000

user_ids = [f"user_{i:04d}" for i in range(n_users)]
records = []

for _ in range(n_records):
    user_id = np.random.choice(user_ids)
    ts = datetime(2026, 1, 1) + timedelta(
        days=np.random.randint(0, 60),
        hours=np.random.randint(0, 24),
    )
    records.append({
        "user_id": user_id,
        "total_purchases": np.random.randint(1, 100),
        "avg_purchase_amount": round(np.random.uniform(10, 500), 2),
        "last_purchase_amount": round(np.random.uniform(5, 1000), 2),
        "purchase_frequency": round(np.random.uniform(0.1, 5.0), 3),
        "user_segment": np.random.choice(["bronze", "silver", "gold", "platinum"]),
        "event_timestamp": ts,
        "created_timestamp": ts,
    })

df = pd.DataFrame(records)
df.to_parquet("feature_repo/data/user_transactions.parquet", index=False)
print(f"Generated {len(df)} records for {n_users} users")

python scripts/generate_data.py
# Generated 5000 records for 1000 users

Feast Workflow

1. Apply — Register Feature Definitions

cd feature_repo
feast apply

Created entity user_id
Created feature view user_transaction_features
Created on demand feature view purchase_analysis

Deploying infrastructure for my_feature_store...

2. Materialize — Sync Offline to Online Store

# Load data for a specific time range into the online store
feast materialize 2026-01-01T00:00:00 2026-03-01T00:00:00

# Incremental load (from last materialize to now)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Materializing 1 feature views from 2026-01-01 to 2026-03-01
user_transaction_features:
100%|████████████████████████| 1000/1000 [00:03<00:00, 312.45it/s]

3. Offline Feature Retrieval (for Training)

from feast import FeatureStore
import pandas as pd

store = FeatureStore(repo_path="feature_repo")

# Entity DataFrame for generating training data
entity_df = pd.DataFrame({
    "user_id": ["user_0001", "user_0042", "user_0100", "user_0500"],
    "event_timestamp": pd.to_datetime([
        "2026-02-01", "2026-02-15", "2026-01-20", "2026-02-28"
    ]),
})

# Retrieve features with Point-in-Time Join
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "user_transaction_features:total_purchases",
        "user_transaction_features:avg_purchase_amount",
        "user_transaction_features:last_purchase_amount",
        "user_transaction_features:purchase_frequency",
        "user_transaction_features:user_segment",
    ],
).to_df()

print(training_df.head())

    user_id  event_timestamp  total_purchases  avg_purchase_amount  ...
0  user_0001  2026-02-01            45           234.56             ...
1  user_0042  2026-02-15            12            89.30             ...
2  user_0100  2026-01-20            78           456.78             ...
3  user_0500  2026-02-28            33           167.42             ...

Point-in-Time Join is the key here. It retrieves the most recent feature values as of each entity's event_timestamp. This ensures accurate training data without data leakage.

4. Online Feature Retrieval (for Serving)

# Retrieve features in real-time serving
online_features = store.get_online_features(
    features=[
        "user_transaction_features:total_purchases",
        "user_transaction_features:avg_purchase_amount",
        "user_transaction_features:user_segment",
        "purchase_analysis:amount_vs_avg_ratio",
        "purchase_analysis:is_high_value",
    ],
    entity_rows=[
        {"user_id": "user_0001", "current_amount": 750.0},
        {"user_id": "user_0042", "current_amount": 50.0},
    ],
).to_dict()

print(online_features)

{
    "user_id": ["user_0001", "user_0042"],
    "total_purchases": [45, 12],
    "avg_purchase_amount": [234.56, 89.30],
    "user_segment": ["gold", "silver"],
    "amount_vs_avg_ratio": [3.199, 0.560],
    "is_high_value": [1, 0],
}

Managing Feature Groups with Feature Service

from feast import FeatureService

# Bundle of features needed for the recommendation model
recommendation_service = FeatureService(
    name="recommendation_features",
    features=[
        user_transaction_features[["total_purchases", "avg_purchase_amount", "user_segment"]],
        purchase_analysis,
    ],
    tags={"model": "recommendation-v2"},
)

# Bundle of features needed for the fraud detection model
fraud_detection_service = FeatureService(
    name="fraud_detection_features",
    features=[
        user_transaction_features,
        purchase_analysis,
    ],
    tags={"model": "fraud-detection-v1"},
)

# Retrieve via Feature Service
features = store.get_online_features(
    features=store.get_feature_service("recommendation_features"),
    entity_rows=[{"user_id": "user_0001", "current_amount": 750.0}],
).to_dict()

Real-time Feature Updates with Push Source

from feast import PushSource

# Push source definition
user_realtime_source = PushSource(
    name="user_realtime_push",
    batch_source=user_transactions_source,
)

# Update features when real-time events occur
store.push(
    push_source_name="user_realtime_push",
    df=pd.DataFrame({
        "user_id": ["user_0001"],
        "total_purchases": [46],
        "avg_purchase_amount": [240.12],
        "last_purchase_amount": [750.0],
        "purchase_frequency": [2.1],
        "user_segment": ["gold"],
        "event_timestamp": [pd.Timestamp.now()],
        "created_timestamp": [pd.Timestamp.now()],
    }),
)

Feature Server Deployment

# Start local Feature Server
feast serve -h 0.0.0.0 -p 6566

# Retrieve features via HTTP API
curl -X POST http://localhost:6566/get-online-features \
  -H "Content-Type: application/json" \
  -d '{
    "features": [
      "user_transaction_features:total_purchases",
      "user_transaction_features:avg_purchase_amount"
    ],
    "entities": {
      "user_id": ["user_0001", "user_0042"]
    }
  }'

Deploying Feature Server with Docker

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install feast[redis]

COPY feature_repo/ feature_repo/
WORKDIR /app/feature_repo

# Apply registry & start server
CMD feast apply && feast serve -h 0.0.0.0 -p 6566

# docker-compose.yml
services:
  feast-server:
    build: .
    ports:
      - '6566:6566'
    depends_on:
      - redis
    environment:
      - REDIS_URL=redis://redis:6379

  redis:
    image: redis:7-alpine
    ports:
      - '6379:6379'

Integration with Airflow (Automated Materialize)

# dags/feast_materialize.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {
    "owner": "ml-platform",
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
}

with DAG(
    dag_id="feast_materialize",
    default_args=default_args,
    schedule_interval="0 */6 * * *",  # Every 6 hours
    start_date=datetime(2026, 1, 1),
    catchup=False,
) as dag:

    materialize = BashOperator(
        task_id="materialize_incremental",
        bash_command=(
            "cd /opt/feature_repo && "
            "feast materialize-incremental $(date -u +'%Y-%m-%dT%H:%M:%S')"
        ),
    )

Conclusion

Here are the key takeaways for building a feature pipeline with Feast:

Consistent Feature Definitions: Using the same feature definitions for training and serving prevents Training-Serving Skew
Point-in-Time Join: Accurate feature joins based on timestamps prevent data leakage
Offline/Online Dual Stores: Offline store for batch training, online store for real-time serving
Feature Service: Managing feature groups per model improves reusability
Push Source: Supports real-time event-based feature updates

A Feature Store may feel like overkill when you have just one or two ML models, but it becomes essential infrastructure as the number of models grows and the team scales. Its value is maximized especially when multiple models share the same features.

Quiz

Q1: What is Training-Serving Skew?

It is a phenomenon where model performance degrades because the features used during training differ from those used during serving. Common causes include inconsistencies in feature computation logic, differences in data sources, and misaligned time references.

Q2: What role does Point-in-Time Join play?

It joins the most recent feature values prior to each entity's event timestamp (event_timestamp). This prevents data leakage where future data would be used during training.

Q3: What is the difference between the offline store and online store in Feast?

The offline store holds large volumes of historical features for batch training (files, BigQuery, etc.), while the online store holds only the latest feature values for low-latency real-time serving (Redis, DynamoDB, etc.).

Q4: What does the feast materialize command do?

It synchronizes (loads) feature data from the offline store into the online store. It stores the latest feature values for the specified time range in the online store, enabling real-time retrieval.

Q5: What is the difference between On-Demand Feature View and a regular Feature View?

A regular Feature View stores pre-computed features, while an On-Demand Feature View dynamically computes features at request time. It is used for real-time transformations that combine request parameters with existing features.

Q6: What are the benefits of Feature Service?

It allows logical grouping of features needed by each model. It makes it clear which model uses which features and provides a consistent interface for feature retrieval.

Q7: What does the TTL (Time To Live) setting mean?

It specifies the validity period for feature values in the online store. Features past their TTL are returned as null during retrieval, preventing stale feature values from being used in serving.

Q8: When would you use Push Source?

It is used when the online store features need to be updated immediately upon real-time events (payments, clicks, etc.). It keeps features up to date between periodic batch materializations.