Feature Stores: Feast, Tecton, and the Training/Serving Skew Problem

Here's a failure that's cost more model launches than any algorithm choice: a model scores beautifully in the notebook, ships to production, and quietly underperforms — no error, no alert, just predictions that aren't as good as the offline evaluation promised. Nine times out of ten the cause is the same. The features the model trained on were computed one way (a Spark job over the warehouse), and the features it sees in production were computed a different way (a hand-written service hitting a live API), and the two don't agree. That gap is training/serving skew, and eliminating it is the entire reason feature stores exist.

A feature store is the system that manages features for machine learning across both training and serving, so the values a model learns from and the values it predicts on are produced by the same definition. The hard parts it solves are an online/offline storage split, point-in-time-correct joins, and training/serving consistency. I'll work through each, look at how Feast and Tecton differ, and be honest about when you shouldn't bother.

The two-store problem

Training and serving want opposite things from storage, and that mismatch is the structural core of a feature store. Training reads huge volumes of historical feature values in bulk — every customer's features across two years — and tolerates latency; it's a batch, scan-heavy job that belongs in a warehouse or lake. Serving reads one entity's features (this customer, right now) at very low latency to make a single prediction; that's a key-value lookup that belongs in a fast online store.

So a feature store keeps two stores, fed from the same feature definitions:

Offline store — historical feature values at scale, for building training datasets and batch scoring. Typically a warehouse or files (BigQuery, Snowflake, Parquet on object storage).
Online store — the latest feature value per entity, for low-latency lookups at inference. Typically a fast KV store (Redis, DynamoDB).

graph TD
    DEF["Feature definition
(written once)"]
    SRC["Raw data sources
(events, tables, streams)"]
    OFF["Offline store
(warehouse / lake — full history)"]
    ON["Online store
(Redis / DynamoDB — latest per entity)"]
    TRAIN["Training: point-in-time
join over history"]
    SERVE["Serving: low-latency
lookup of one entity"]
    DEF --> SRC
    SRC --> OFF
    OFF -->|"materialize latest values"| ON
    OFF --> TRAIN
    ON --> SERVE

The feature store's two-store architecture. One feature definition feeds both an offline store (full history, for training) and an online store (latest value per entity, for serving), with a materialization step pushing fresh values online. Because both paths derive from the same definition, the features a model trains on and predicts on stay consistent — that's the whole point.

Point-in-time joins: the subtle correctness trap

This is the part that separates a feature store from "a table of features," and it's where teams silently sabotage their own models. To build a training set, you take labeled events — "customer churned on March 3" — and attach the features that were true at that moment. The trap: if you naively join the latest feature values onto a historical event, you leak the future into the past. The model trains on "customer's total spend" as of today attached to a churn event from March, sees a suspiciously strong signal, and scores brilliantly offline — then fails in production where today's value isn't available yet. That's label leakage via a sloppy join.

A point-in-time correct join (sometimes "time-travel join") fixes it by, for each training row, fetching the feature value as of that row's timestamp — the most recent value that existed before the event, never after. Doing this correctly across many features with different update cadences is fiddly and easy to get wrong by hand, so the feature store does it for you: you ask for a training dataset for a set of entities and timestamps, and it assembles point-in-time-correct features. This single capability is the strongest argument for adopting one.

# Feast: build a training set with point-in-time-correct features
# entity_df has the labeled events: (customer_id, event_timestamp, churned)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "customer_stats:total_spend_30d",
        "customer_stats:orders_7d",
        "customer_stats:days_since_signup",
    ],
).to_df()
# For each row, Feast joins the feature value as of that row's event_timestamp —
# never a value from after the event. No leakage, no hand-rolled time-travel SQL.

If you remember one thing from this article, make it this: the bug that point-in-time joins prevent is invisible in offline metrics — it makes them look better. A leaky join inflates your validation scores, so it doesn't trip any alarm; you find out only when the production model underdelivers and you can't explain why. A model that's "great offline, mediocre live" with no obvious cause is the classic signature. That asymmetry — the error hides by improving your numbers — is exactly why it's worth letting a system handle the join.

Consistency: one definition, both paths

Bring the pieces together and the core guarantee is consistency. You write a feature's transformation logic once, and the feature store ensures that same logic produces the offline (training) values and the online (serving) values. The model trains and predicts on features computed identically. When skew does creep in, it's usually at a boundary the store doesn't cover — a feature computed in application code at request time that was never registered — which is the first place to look when a model degrades.

A second benefit falls out for free: reuse and governance. Features defined in the store are discoverable and shareable across teams and models. The "customer lifetime value" feature one team built becomes a registered, documented, reusable asset rather than something the next team reinvents slightly differently. That's the same data-governance instinct — a registry of trusted, named assets — applied to ML features, and it's why feature stores get framed as MLOps infrastructure rather than just a cache.

Feast vs Tecton

The two most-cited options sit at different points on the build-vs-buy line, and they're related — Tecton's founders created Feast and donated it to open source.

	Feast	Tecton
Model	Open-source library / framework	Managed, commercial platform
What it manages	Definitions, registry, materialization, serving over your stores	The above plus the feature transformation/compute and pipelines
Transformations	You compute features; Feast orchestrates storage & retrieval	Defines and runs the feature pipelines for you (batch + streaming)
Operational burden	You run and wire the infrastructure	Largely managed
Best when	You want control, have data infra, want no vendor lock-in	You want feature engineering + serving handled end to end

The honest distinction: Feast deliberately doesn't compute your features — it's the storage, registry, and retrieval layer over stores you provide, which keeps it light and unopinionated but leaves the transformation pipelines to you. Tecton takes on the feature computation too (including streaming features), which is more capable and more managed, at the cost of being a platform you buy into. Pick based on how much of the pipeline you want to own.

Most teams adopt a feature store a year too early. It's real infrastructure with real operational weight (two stores to keep in sync, a materialization pipeline, a registry), and it earns that weight under specific conditions: multiple models or teams sharing features, a genuine need for low-latency online serving, and real-time or frequently-updated features where skew actually bites. If you have one model, batch scoring, and features you compute in the same pipeline for train and predict, a feature store is overhead solving a problem you don't have yet — a shared transformation module and disciplined point-in-time SQL will do. Adopt it when feature reuse and online/offline consistency are concrete pains, not because the architecture diagram looks more mature with one in it.

What to carry away

A feature store exists to kill training/serving skew — the silent accuracy loss when features are computed one way for training and another for serving. It does so with a two-store architecture (an offline store with full history for training, an online store with the latest value per entity for low-latency serving) fed from a single feature definition, so both paths agree. Its sharpest capability is the point-in-time-correct join, which assembles training data with each feature as of the event's moment — preventing label leakage that, dangerously, only ever makes your offline metrics look better.

Feast gives you the storage/registry/retrieval layer over your own infrastructure; Tecton additionally manages the feature computation as a platform. Either way, adopt one when you actually have feature reuse across models and a real online-serving need — not before, because it's genuine infrastructure with genuine upkeep. It slots alongside the rest of the MLOps stack: MLflow for the models that consume these features, and the observability layer that catches skew when it slips through anyway.