- Published on
Feature Stores 2026 Deep Dive — Feast, Tecton, Hopsworks, Databricks, Vertex AI, SageMaker, Featureform, Bytewax, Materialize, RisingWave, Fennel, Chalk
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Intro — May 2026, the honest question: "do we even need a feature store?"
The 2020–2022 golden age of feature stores is over. As of May 2026 the market has split in two. On one side, dedicated platforms (Tecton, Hopsworks, Chalk, Fennel) own the enterprise. On the other, lakehouse/warehouse-native feature stacks (Databricks Feature Engineering in Unity Catalog, Snowflake Feature Store, Vertex AI Feature Store) quietly absorbed everything else with a simple pitch: "it lives in your data platform already, why a separate box?"
The most common question has changed too. It is no longer "which feature store should we adopt?" but "do we need a feature store at all? Couldn't we just write feature tables with dbt and cache them in Redis?" This post is not a marketing matrix. It walks through the real problem feature stores solve, the cases where that problem actually demands a separate system, and what the 2026 menu looks like, with code.
What a feature store is — and isn't: registry, materialization, serving
First, vocabulary. We say "feature store" as if it were one thing. It is actually three responsibilities in one box.
- Feature registry: metadata about who owns a feature, what source it comes from, what transformation produced it, what type it has.
- Offline materialization: joining historical feature values at the correct moment in time to build training datasets.
- Online serving: a millisecond-latency key-value lookup that returns the freshest feature at inference time.
These three are separable. dbt + Snowflake can do (1) and (2). Redis can do (3). That is already a lightweight feature store. What Feast/Tecton/Hopsworks add is packaging the three under one API and guaranteeing consistency between them — especially against training-serving skew.
The core problem — why training-serving skew happens
The one essential problem a feature store solves is training-serving skew: the values seen during training and inference drift apart and the model breaks in production. The cause is mundane.
- Training computes features from some historical point in time.
- Serving computes features from the current point in time.
- The two code paths are written in different SQL/Python, with subtly different time handling.
Preventing this requires (a) one place where the feature is defined and (b) correct point-in-time joins. Everything else about a feature store flows from these two requirements.
Online vs offline stores — why two backends
A feature store almost always has two backends.
- Offline store: for training-set generation. Snowflake, BigQuery, Redshift, Iceberg/Delta on S3. Petabytes of history, optimized for time-range scans.
- Online store: for inference serving. Redis, DynamoDB, Cassandra, ScyllaDB, Aerospike. Key lookups with p99 ≤ 10 ms.
The same feature lives in both places with different freshness. Offline holds every value through yesterday; online holds "the freshest one row" per entity. Keeping them in sync (materialization) is the day-to-day operational reality of running a feature store.
Point-in-time correctness — building leakage-free training sets
The hard part of training-set creation is: for this user receiving this label at time t, fetch only the feature values that were knowable as of t. Otherwise the future leaks into training and offline performance balloons.
A feature store's point-in-time join writes a query that looks roughly like this, safely.
-- Time-correct join between labels and a feature table
SELECT
l.user_id,
l.event_ts,
l.label,
f.feature_value
FROM labels l
LEFT JOIN LATERAL (
SELECT feature_value
FROM user_features f
WHERE f.user_id = l.user_id
AND f.event_ts <= l.event_ts
ORDER BY f.event_ts DESC
LIMIT 1
) f ON TRUE;
Hand-written, this kind of join silently regresses — a < becomes <=, a join order shifts, a tiny leak appears. get_historical_features makes the system own that safety.
Feast — CNCF sandbox, the OSS default
Feast is the open-source feature store built by ex-Gojek/Google engineers in 2020 and accepted into the CNCF sandbox in 2024. As of May 2026 it is the most widely deployed OSS option. Its "bring your own backend" philosophy: Feast itself ships the metadata registry and SDK, while the offline store is BigQuery/Snowflake/Redshift/Spark and the online store is Redis/DynamoDB/Postgres/Bigtable.
A typical Feature View looks like this.
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
driver = Entity(name="driver", join_keys=["driver_id"])
driver_stats_source = FileSource(
name="driver_stats_source",
path="data/driver_stats.parquet",
timestamp_field="event_timestamp",
)
driver_stats_fv = FeatureView(
name="driver_hourly_stats",
entities=[driver],
ttl=timedelta(days=1),
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64),
],
source=driver_stats_source,
)
feast apply registers definitions; feast materialize syncs offline to online. Backends are chosen in feature_store.yaml.
project: prod_recsys
registry:
registry_type: sql
path: postgresql://feast:***@registry-db:5432/feast
provider: aws
online_store:
type: redis
connection_string: "redis-prod.cluster.local:6379"
offline_store:
type: snowflake.offline
account: xy12345
warehouse: COMPUTE_WH
database: ML
schema: FEATURES
entity_key_serialization_version: 2
Training uses store.get_historical_features(entity_df, ...); serving uses store.get_online_features(...). Simplicity is its strength; streaming features are its weakness (you need Push API or an external stream engine).
Tecton — commercial SaaS, batch + streaming under one decorator
Tecton is the commercial feature platform founded in 2019 by ex-Uber Michelangelo engineers. In 2024 it absorbed parts of the experimentation tool Eppo, extending into "features + experiments + model monitoring." Its biggest strength is defining batch, streaming, and on-demand features with the same decorator model.
from tecton import stream_feature_view, FilteredSource, Aggregation
from datetime import timedelta
@stream_feature_view(
source=FilteredSource(transactions_stream),
entities=[user],
mode="pyspark",
aggregations=[
Aggregation(column="amount", function="sum", time_window=timedelta(minutes=10)),
Aggregation(column="amount", function="count", time_window=timedelta(hours=1)),
],
online=True,
offline=True,
feature_start_time=datetime(2024, 1, 1),
)
def user_txn_aggregates(transactions):
return transactions.select("user_id", "amount", "timestamp")
Tecton runs the windowed aggregations on Spark Streaming/Flink and uses the same definition to build historical training sets. Trade-offs: it is expensive, and the Tecton plane lives partly inside the customer's VPC/account, so it adds operational surface area.
Hopsworks — enterprise + open source, the European contender
Hopsworks was built by Sweden's Logical Clocks (rebranded to Hopsworks AB in 2023) as a full-stack ML platform. The feature store is the center of gravity, but model registry, serving, and vector indexes ship in the same box. Apache 2.0 open source and a SaaS (Serverless included) exist in parallel.
The offline store is Hudi-based or external Snowflake/BigQuery; the online store is RonDB (in-memory KV on top of MySQL Cluster) or an external system. RonDB combines transactional guarantees with millisecond p99, which is rare.
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()
fg = fs.get_or_create_feature_group(
name="user_txn_aggregates",
version=1,
primary_key=["user_id"],
event_time="event_ts",
online_enabled=True,
)
fg.insert(txn_df, write_options={"wait_for_job": True})
fv = fs.create_feature_view(
name="user_fraud_features",
version=1,
query=fg.select_all(),
labels=["is_fraud"],
)
train_df, test_df = fv.train_test_split(test_size=0.2)
Strong GDPR / EU AI Act story makes it popular in European finance.
Databricks Feature Engineering in Unity Catalog — the lakehouse-native answer
Databricks ran a separate Workspace Feature Store through 2023, then in 2024 collapsed the abstraction: a feature table is just a Delta table in Unity Catalog. The standalone metadata system disappeared; features are UC tables, and permissions, lineage, and search are all served by UC.
from databricks.feature_engineering import FeatureEngineeringClient
from databricks.feature_engineering import FeatureLookup
fe = FeatureEngineeringClient()
fe.create_table(
name="ml.user_features.transaction_summary",
primary_keys=["user_id"],
timestamp_keys=["event_ts"],
df=summary_df,
)
training_set = fe.create_training_set(
df=labels_df,
feature_lookups=[
FeatureLookup(
table_name="ml.user_features.transaction_summary",
feature_names=["amount_7d_sum", "amount_30d_avg"],
lookup_key="user_id",
timestamp_lookup_key="event_ts",
)
],
label="fraud",
)
Online serving is Databricks Online Tables (Lakebase-backed), which automatically mirrors the same UC table. For teams already living inside Databricks, evaluating a separate vendor is genuinely optional now.
Vertex AI Feature Store — the GCP integrated take
Vertex AI Feature Store was redesigned in 2023 (Feature Store 2.0) around BigQuery as the offline backend and BigQuery + Bigtable or an Optimized Online Store as the online tier. The user registers a BigQuery view as a feature view and Vertex automatically builds the online mirror for millisecond serving.
The big upside is almost no ETL. The downsides are GCP lock-in and weaker streaming freedom than Tecton/Hopsworks; teams typically pair it with Dataflow/Pub-Sub to get the freshness they need.
SageMaker Feature Store — the AWS-native option
SageMaker Feature Store is AWS's managed feature store, launched in 2020. Offline runs on S3 + Athena/Glue; online is a DynamoDB-backed in-memory KV. The unit of grouping is a Feature Group, and data flows into both stores synchronously or asynchronously.
The In-Memory Online Store option added in 2024 reaches p99 < 5 ms — usable for ultra-low-latency paths like ad bidding. Tight SageMaker Studio/Pipelines/Model Registry integration is the strength; batch transforms ride on EMR/Glue and streaming transforms on Kinesis Data Analytics/Flink.
Featureform and the lighter alternatives — the "virtual feature store" idea
Featureform started with the pitch: "don't move the physical data; just lay a virtual layer over your existing infra (Snowflake, BigQuery, Spark, Redis, DynamoDB)." Featureform owns definitions, lineage, and access control; the data stays where it already lives. The upside is low onboarding cost; the downside is that "definitions only, execution elsewhere" shifts operational load back to the user.
In the same lane, you still see TensorFlow TFX FeatureColumn (legacy), PyTorch TorchRec (embedding tables for recommendation), and hand-rolled Airflow + Redis combinations. The graveyards of Splice Machine (shut down 2022) and Logical Clocks (now Hopsworks) are also worth checking before you commit, since OSS activity can quietly fade.
Bytewax — Python-native streaming feature engine
Bytewax is a Rust core with a Python dataflow API. It consumes Kafka/Kinesis/Pulsar streams and lets data scientists write windowed aggregations, joins, and sessionization in Python. It gives up some of Flink/Spark Streaming's raw power but wins on operability by the data scientist who wrote the feature.
import bytewax.operators as op
from bytewax.connectors.kafka import KafkaSource, KafkaSink
from bytewax.dataflow import Dataflow
flow = Dataflow("user-amount-1min")
stream = op.input("kafka-in", flow, KafkaSource(brokers=["kafka:9092"], topics=["txns"]))
keyed = op.key_on("key-by-user", stream, lambda r: r["user_id"])
windowed = op.windowed_reduce(
"amount-1min-sum",
keyed,
clock=...,
windower=...,
builder=lambda: 0.0,
folder=lambda acc, r: acc + r["amount"],
)
op.output("kafka-out", windowed, KafkaSink(brokers=["kafka:9092"], topic="features"))
Bytewax is not itself a feature store. You wire its output straight to Feast/Tecton's online store (Redis) or route through Kafka to Feast's Push API.
Materialize, RisingWave, Feldera, ksqlDB — building features with streaming SQL
The other route to streaming features is a streaming SQL engine. A single CREATE MATERIALIZED VIEW keeps its result incrementally up to date as events flow in, and that result lands directly in the online store.
- Materialize: built on differential dataflow. Strong ANSI SQL compatibility and consistency. BYOC mode added in 2024.
- RisingWave: streaming processing fused with persistent state. PostgreSQL wire-protocol compatible.
- Feldera: incremental compute engine based on the DBSP (Database Stream Processor) paper. Open-sourced in 2024.
- ksqlDB: Confluent's Kafka-native streaming SQL.
The big win: the same SQL works for batch (historical training data) and streaming (current features), which reduces the code-path divergence that creates training-serving skew at the root.
-- Materialize/RisingWave: per-user 10-minute amount sum (streaming)
CREATE MATERIALIZED VIEW user_amount_10m AS
SELECT
user_id,
SUM(amount) AS amount_10m_sum,
COUNT(*) AS txn_10m_count
FROM transactions
WHERE event_ts >= mz_now() - INTERVAL '10 minutes'
GROUP BY user_id;
If that view streams into Redis, you have online features.
Fennel and Chalk — transactional and external-API-aware feature stores
These two are the new entrants that took fintech and payments by storm.
Fennel.ai arrived in 2022 with a different pitch: express both batch and streaming features in one Python dataframe API, not SQL. Its differentiator is transactional consistency — feature updates for the same user are applied in the same order as the payment/event sequence. In 2025 Stripe acquired Fennel; external SaaS activity has cooled, but the transactional feature store idea is now becoming the payments-industry default. Fennel is likely to live on as Stripe's internal fraud-feature backbone.
Chalk.ai launched in 2022 as a commercial platform. You define features as Python classes; Chalk derives both the batch and streaming paths automatically. Its strengths: (a) treating external API calls as first-class cacheable features (e.g., a credit-bureau response), and (b) automatically topologically sorting the feature graph at inference time. It is growing fast in fintech and insurance. Downsides: closed source and SaaS pricing.
When embeddings become features — convergence with vector DBs
Through 2024 embeddings were a separate track. In 2026 embeddings are just another feature type. Tecton, Hopsworks, and Feast all support vector dtypes natively, and you can register Pinecone, Weaviate, Milvus, or PostgreSQL pgvector as the online store.
Two consequences. First, vector DBs are increasingly positioning themselves as "embedding-only feature stores" (Pinecone Serverless, Weaviate Cloud). Second, feature stores are absorbing vector indexes into one SDK (Hopsworks Vector Index is the clearest example). By 2027 the boundary will be largely gone.
Feature monitoring — drift detection is now first-class
The second big responsibility of a 2026 feature store is monitoring the features themselves: distribution drift, null-rate spikes, cardinality blowups, time-based freshness SLOs. Evidently AI, Arize, WhyLabs, and Fiddler integrate directly into feature stores; Tecton and Hopsworks ship their own monitoring inline.
The point of this layer is to separate model degradation from data/feature degradation quickly. If you don't watch feature drift you waste cycles retraining a model whose data has shifted.
Comparison table — the 2026 feature store landscape at a glance
| Tool | OSS | Streaming | Batch | Online | Offline | Lineage/Governance | Operational model |
|---|---|---|---|---|---|---|---|
| Feast | Apache 2.0 | weak (Push API) | strong | Redis/DynamoDB/Bigtable | BQ/Snowflake/Redshift | basic | self-hosted |
| Tecton | closed | strong (Flink/Spark) | strong | Redis/DynamoDB | S3/Snowflake/BQ | strong | SaaS + customer VPC |
| Hopsworks | AGPL + commercial | medium (Flink) | strong | RonDB | Hudi/external | strong | self-host + SaaS |
| Databricks UC FE | closed | strong (DLT) | strong | Online Tables | Delta on UC | very strong | Databricks native |
| Vertex AI FS | closed | medium | strong | Bigtable/Optimized | BigQuery | strong | GCP managed |
| SageMaker FS | closed | medium | strong | DynamoDB/In-Memory | S3/Athena | medium | AWS managed |
| Featureform | Mozilla Public 2.0 | external | external | external | external | medium | virtual layer |
| Chalk.ai | closed | strong | strong | own | own/external | strong | SaaS |
| Fennel (Stripe) | closed | very strong | strong | own | own | strong | internalized |
Data contracts and the model-feature graph — the governance muscle
Operating a feature store inevitably converges on data contracts. An upstream schema change breaks feature computation, which breaks model inference. A good 2026 feature store (a) detects upstream schema changes automatically, (b) shows the affected feature views and models in a lineage graph, and (c) blocks incompatible changes ahead of time. Databricks UC's column masking/lineage, Tecton's source tracking, and Hopsworks's schema evolution policy are examples.
On top of contracts you need a bidirectional model-feature graph. If Feature X disappears, the list of affected models should be visible immediately; for Model Y, every input Feature should be inspectable. MLflow Model Registry, Databricks UC model lineage, Tecton's Feature Service-to-Model link, and external catalogs (DataHub, OpenMetadata) all play in this space. A feature store without these breaks under its own weight eventually.
The lakehouse unification thesis — "the feature store is a layer, not a product"
Since 2025 the strongest trend has been the feature store being absorbed into the lakehouse as a layer rather than living as a separate product. Databricks merging Workspace Feature Store into UC FE was the decisive signal; Snowflake walked the same road with Snowpark + Snowflake Feature Store (2024 GA). The Iceberg + Polaris/Tabular camp is heading the same way.
If this thesis holds, the standalone "feature store" category survives by 2027–2028 only in ultra-low-latency / transactional niches (fintech, ads). Mainstream ML lives inside the lakehouse. The catch is that online-store integration must be fast enough; as of May 2026 Online Tables/Bigtable integrations still don't quite hit fintech latencies.
"Couldn't we just use dbt + Redis?" — the honest answer
This is the part most posts skip. You can probably get away without a dedicated feature store if ALL of the following hold.
- You have five or fewer production models.
- Features are mostly batch (hourly/daily). Sub-minute freshness is not a requirement.
- Few features are reused across models.
- A clear owner can guarantee training-serving consistency and write point-in-time joins by hand.
- Governance / audit pressure is light.
Conversely, any one of these and a dedicated feature store is probably the right call.
- Model count is growing into the tens or hundreds.
- Streaming features need 1-second to 1-minute freshness.
- The same feature is shared across five or more models.
- GDPR / EU AI Act / national privacy regulations require lineage.
- Fraud, recommendation, or ads workloads sit on a millisecond online lookup path.
Korea adoption — Coupang, Toss, Naver
- Coupang has run an in-house feature platform for recommendation, ranking, and logistics ETA models since around 2021. Public talks on the Coupang Engineering Blog have shown a Spark + Redis + custom metadata layer architecture; by 2025–2026 it is reported to use an Iceberg-based offline tier combined with Aerospike/Redis online.
- Toss has shared Kafka + Flink + Redis streaming-feature pipelines for fraud, recommendation, and credit scoring at the SLASH conference. Rather than a single full-stack store, each domain team runs a short and focused feature pipeline — a distributed strategy in itself.
- Naver operates feature-management components inside internal ML platforms like Hyperion, with HBase/Redis/MySQL as backends. Search, shopping, and the CLOVA lineup each adapt the pattern.
All three chose build-in-house or partial reuse over plain Feast or plain Tecton. The traffic volume and domain specifics sit beyond what off-the-shelf OSS handles comfortably.
Japan adoption — Mercari, ZOZO, CyberAgent
- Mercari published a Feast-based in-house feature platform around 2022–2023 on the Mercari Engineering Blog: BigQuery offline + Redis online + Vertex AI training pipelines.
- ZOZO has repeatedly written on the ZOZO TECH BLOG about recommendation and search features served via BigQuery + Memorystore Redis. Some teams have evaluated Vertex AI Feature Store.
- CyberAgent AI Lab has shared streaming-feature pipelines for ads and game recommendation (Kafka + Flink/Beam + Bigtable). The ad-bidding hot path lives in a separate in-memory KV — the textbook split.
All three lean into Japan's "BigQuery-first + GCP managed" tendency.
Outlook 2026–2027 — consolidation, attrition, specialization
Three flows running in parallel.
- Lakehouse absorption: generalist ML gets swallowed by Databricks UC FE and Snowflake Feature Store.
- OSS middle-tier attrition: Feast solidifies as the standard, but secondary OSS contenders lose oxygen.
- Specialized SaaS survives: fintech, ads, fraud — anywhere transactional millisecond features are core — keeps Tecton/Hopsworks/Chalk in play.
The interesting wild card is the streaming SQL camp (Materialize, RisingWave, Feldera) positioning itself directly as feature stores. By 2027 the "feature store vs. streaming DB" boundary may genuinely blur.
Closing — choose responsibilities, not tools
The real feature-store choice is who owns the three responsibilities (registry, correctness, serving). If you're already deep in one of Databricks/Snowflake/GCP/AWS, prove the native option insufficient before adopting a separate vendor. If you're multi-cloud or streaming freshness is core, Tecton/Hopsworks/Chalk earn their seat. If you need a lightweight OSS standard, it's Feast.
Five operational anti-patterns that quietly kill feature-store rollouts. (1) Registering one-off SQL transforms as features, polluting the catalog. (2) No explicit online freshness SLO (p99 ≤ X minutes); "must be fresh" doesn't survive contact with reality. (3) No drift monitoring, so you blame the model. (4) Empty owner field, so nobody can touch a feature six months later. (5) No unit/regression tests on feature definitions, so a small change silently introduces training-serving skew.
Answer honestly: do you actually need a feature store? If you have under ten models and they're batch-heavy, well-written dbt + Redis caching is already a fine feature store. Tools come after that.
References
- Feast docs: https://docs.feast.dev/
- Tecton docs: https://docs.tecton.ai/
- Hopsworks docs: https://docs.hopsworks.ai/
- Databricks Feature Engineering: https://docs.databricks.com/aws/en/machine-learning/feature-store/
- Vertex AI Feature Store: https://cloud.google.com/vertex-ai/docs/featurestore
- SageMaker Feature Store: https://aws.amazon.com/sagemaker/feature-store/
- Bytewax docs: https://docs.bytewax.io/
- Materialize docs: https://materialize.com/docs/
- RisingWave docs: https://docs.risingwave.com/
- Feldera GitHub: https://github.com/feldera/feldera
- Featureform docs: https://docs.featureform.com/
- Chalk.ai docs: https://docs.chalk.ai/
- Fennel.ai blog (pre-Stripe): https://fennel.ai/blog
- ksqlDB docs: https://docs.ksqldb.io/
- Feast GitHub: https://github.com/feast-dev/feast
- Apache Flink docs: https://flink.apache.org/docs/
- Snowflake Feature Store: https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/overview
- Evidently AI docs: https://docs.evidentlyai.com/
- Arize AI docs: https://docs.arize.com/arize
- Coupang Engineering Blog: https://medium.com/coupang-engineering
- Toss Engineering (SLASH): https://toss.tech/
- Mercari Engineering Blog: https://engineering.mercari.com/en/blog/
- ZOZO TECH BLOG: https://techblog.zozo.com/
- CyberAgent AI Tech Publish: https://ai-tech.cyberagent.co.jp/
- classmethod.jp Feature Store articles: https://dev.classmethod.jp/articles/?s=feature+store