Designing a Data API: Serving Analytical Data to Applications

There's a moment in many data projects when the work flips from "analysts query the warehouse" to "an application needs this data, fast, for users." A product wants to show each customer their personalized metrics; a mobile app needs a real-time risk score; a partner wants programmatic access. This is the data API — the serving layer between your analytical data and the applications that consume it — and it's a genuinely different problem from analytics, with a different access pattern, latency budget, and failure mode. The most common and most damaging mistake is to treat it like analytics and point the app straight at the warehouse.

Using the system-design framework, I'll design a data API: why the warehouse is the wrong backend for it, the precompute-and-serve pattern that fixes that, the API style and store choices, and the operational concerns (caching, pagination, rate limiting) that make it production-grade.

Requirements: this is an operational access pattern

The defining shift from analytics: a data API serves many small, low-latency, high-concurrency reads — "give me this customer's dashboard numbers in 50 ms," thousands of times a second — not a few big scans. That inverts the non-functional requirements:

  • Latency — tens of milliseconds, p99, not seconds.
  • Concurrency — potentially thousands of QPS from app traffic, not a handful of analysts.
  • Availability — it's in the application's critical path; if the API is down, the feature is down.
  • Freshness — how current must the served data be? Real-time, or is yesterday's precomputed result fine? (This decides the whole pipeline behind it.)

Do not serve a user-facing API directly from your analytical warehouse. It's the single most common data-API mistake. Warehouses (Snowflake, BigQuery, Redshift) are built for large scans by low concurrency — exactly the opposite of an API's many-small-fast-reads-at-high-QPS pattern. Point app traffic at one and you get multi-second latencies, throttling under concurrency, and a terrifying bill (you're paying scan-priced compute for point lookups). The warehouse is where you compute the data; it is not where you serve it. Confusing the two melts both your latency SLO and your budget.

The pattern: precompute, then serve from a fast store

The core architecture follows directly from the mismatch: compute results in the analytical layer, then move them into a fast operational store that the API reads from. The expensive analytical work (aggregations, joins, model scoring) happens in the warehouse/pipeline on a schedule or stream; the results are synced into a low-latency store keyed for the API's access pattern; the API serves point lookups from there in milliseconds.

graph LR
    WH["Warehouse / pipeline
(heavy compute: aggregates,
joins, scoring)"] SYNC["Sync / reverse-ETL / stream
(precomputed results)"] STORE["Serving store
(KV / Postgres / Redis) —
keyed for API reads"] API["Data API
(REST / GraphQL / gRPC)
+ cache, auth, rate limit"] APP["Applications
(web, mobile, partners)"] WH --> SYNC --> STORE --> API --> APP API -. cache hot keys .-> API

Precompute-and-serve. Heavy analytical work runs in the warehouse/pipeline; results are synced (reverse-ETL or streaming) into a fast store keyed for the API's lookups; the API serves low-latency reads with caching, auth, and rate limiting in front. The warehouse computes, the serving store serves — never make the app wait on a scan.

The serving store is chosen for the access pattern, not familiarity: a key-value store or Redis for simple keyed lookups at extreme speed, Postgres for richer queries with indexes, or a real-time OLAP engine (Druid/Pinot/ClickHouse) when the API itself needs fast filtered aggregations over fresh data. This is precisely the online store idea from feature serving — and when freshness must be near-real-time, a streaming database can keep the served results continuously current instead of on a batch sync.

API style: REST, GraphQL, or gRPC

With the backing store decided, choose how clients talk to it. The three common styles, by fit:

StyleStrengthBest when
RESTSimple, universal, cacheable over HTTPDefault — well-defined resources, broad consumers, public APIs
GraphQLClients fetch exactly the fields they need in one requestMany varied clients / nested data; avoids over- and under-fetching
gRPCCompact binary, fast, typed contractsLow-latency internal service-to-service calls

REST is the sensible default — universally understood and HTTP-cacheable. GraphQL earns its complexity when diverse clients need different slices of nested data (and brings its own trap: an unbounded query can fan out into an expensive backend load, so depth/complexity limits matter). gRPC shines for internal, latency-sensitive calls between services. As with everything in the framework, tie the choice to the consumers, not to fashion.

The operational essentials

A data API lives in the application's critical path, so the cross-cutting concerns aren't optional — they're what make it production-grade:

  • Caching — a cache (CDN for public/static responses, Redis for hot keys) in front of the store absorbs repeated reads, cutting latency and load. The hard part is invalidation when underlying data refreshes — tie cache TTLs to your freshness requirement.
  • Pagination — never return an unbounded result set. Prefer cursor (keyset) pagination over OFFSET for large datasets, because deep offsets get pathologically slow (the same deep-pagination problem search engines hit).
  • Rate limiting — protect the backend from abusive or runaway clients; a single misbehaving consumer shouldn't degrade the API for everyone (a sorted-set-in-Redis sliding window is a classic implementation).
  • Auth & versioning — authenticate/authorize per consumer (API keys, OAuth) and version the API (/v1/) so you can evolve the contract without breaking existing clients — the same schema-evolution discipline that streaming contracts enforce.

The trade-off: precompute vs query-on-demand

The defining design tension is precompute vs on-demand. Precomputing every possible answer and serving it from a key-value store gives the fastest reads, but you must compute and store combinations the user may never request, and freshness is bounded by the sync cadence. Querying on demand against an indexed store (or a fast OLAP engine) is fresher and stores less, but each request does real work, so it's harder to guarantee tight latency at high QPS.

The resolution is usually a blend keyed to the access pattern: precompute the common, expensive, high-traffic results (dashboards, leaderboards) and query on demand for the long tail and the ad-hoc. The decision is the framework's read pattern + freshness + latency requirements applied at the serving layer — there's no universal answer, only the one your numbers dictate.

A useful default I reach for: serve the known, hot, expensive queries from precomputed results in a fast store (so the p99 stays flat under load), and reserve on-demand querying for the genuinely dynamic or rarely-requested. It bounds your worst-case latency where traffic concentrates while keeping the system fresh and lean everywhere else. Decide what to precompute by looking at the actual request distribution, not by guessing — the traffic is usually far more skewed than teams expect.

What to carry away

A data API is an operational serving problem, not an analytical one: many small, low-latency, high-concurrency reads. The cardinal rule is don't serve it from the warehouse — compute results in the analytical layer and serve them from a fast store keyed for the access pattern (the precompute-and-serve / online-store pattern). Choose the API style by consumer (REST default, GraphQL for varied nested needs, gRPC for internal speed), and treat caching, pagination, rate limiting, auth, and versioning as required, not optional. Resolve the precompute-vs-on-demand tension by precomputing the hot, expensive paths and querying on demand for the tail.

This completes the system-design set: the framework, a warehouse (where data is computed), a pipeline (how it flows), and the data API (how applications consume it). The constant across all four: clarify requirements, let the access pattern drive the design, and defend every choice as a trade-off.