# Designing a Data API: Serving Analytical Data to Applications

There's a moment in many data projects when the work flips from "analysts query the warehouse" to "an application needs this data, fast, for users." A product wants to show each customer their personalized metrics; a mobile app needs a real-time risk score; a partner wants programmatic access. This is the **data API** — the serving layer between your analytical data and the applications that consume it — and it's a genuinely different problem from analytics, with a different access pattern, latency budget, and failure mode. The most common and most damaging mistake is to treat it like analytics and point the app straight at the warehouse.

Using the [system-design framework](system-design-for-data-engineers), I'll design a data API: why the warehouse is the wrong backend for it, the precompute-and-serve pattern that fixes that, the API style and store choices, and the operational concerns (caching, pagination, rate limiting) that make it production-grade.

## Requirements: this is an operational access pattern

The defining shift from analytics: a data API serves **many small, low-latency, high-concurrency reads** — "give me *this* customer's dashboard numbers in 50 ms," thousands of times a second — not a few big scans. That inverts the non-functional requirements:

- **Latency** — tens of milliseconds, p99, not seconds.

- **Concurrency** — potentially thousands of QPS from app traffic, not a handful of analysts.

- **Availability** — it's in the application's critical path; if the API is down, the feature is down.

- **Freshness** — how current must the served data be? Real-time, or is yesterday's precomputed result fine? (This decides the whole pipeline behind it.)

**Do not serve a user-facing API directly from your analytical warehouse.** It's the single most common data-API mistake. Warehouses (Snowflake, BigQuery, Redshift) are built for large scans by low concurrency — exactly the opposite of an API's many-small-fast-reads-at-high-QPS pattern. Point app traffic at one and you get multi-second latencies, throttling under concurrency, and a terrifying bill (you're paying scan-priced compute for point lookups). The warehouse is where you *compute* the data; it is not where you *serve* it. Confusing the two melts both your latency SLO and your budget.

## The pattern: precompute, then serve from a fast store

The core architecture follows directly from the mismatch: **compute results in the analytical layer, then move them into a fast operational store that the API reads from.** The expensive analytical work (aggregations, joins, model scoring) happens in the warehouse/pipeline on a schedule or stream; the results are synced into a low-latency store keyed for the API's access pattern; the API serves point lookups from there in milliseconds.

```mermaid
graph LR
    WH["Warehouse / pipeline(heavy compute: aggregates,joins, scoring)"]
    SYNC["Sync / reverse-ETL / stream(precomputed results)"]
    STORE["Serving store(KV / Postgres / Redis) —keyed for API reads"]
    API["Data API(REST / GraphQL / gRPC)+ cache, auth, rate limit"]
    APP["Applications(web, mobile, partners)"]
    WH --> SYNC --> STORE --> API --> APP
    API -. cache hot keys .-> API
          
```

Precompute-and-serve. Heavy analytical work runs in the warehouse/pipeline; results are synced (reverse-ETL or streaming) into a fast store keyed for the API's lookups; the API serves low-latency reads with caching, auth, and rate limiting in front. The warehouse computes, the serving store serves — never make the app wait on a scan.

The **serving store** is chosen for the access pattern, not familiarity: a key-value store or [Redis](redis-internals) for simple keyed lookups at extreme speed, Postgres for richer queries with indexes, or a real-time OLAP engine (Druid/Pinot/ClickHouse) when the API itself needs fast filtered aggregations over fresh data. This is precisely the [online store](feature-stores-feast-tecton) idea from feature serving — and when freshness must be near-real-time, a [streaming database](streaming-databases) can keep the served results continuously current instead of on a batch sync.

## API style: REST, GraphQL, or gRPC

With the backing store decided, choose how clients talk to it. The three common styles, by fit:

| Style | Strength | Best when |
| --- | --- | --- |
| **REST** | Simple, universal, cacheable over HTTP | Default — well-defined resources, broad consumers, public APIs |
| **GraphQL** | Clients fetch exactly the fields they need in one request | Many varied clients / nested data; avoids over- and under-fetching |
| **gRPC** | Compact binary, fast, typed contracts | Low-latency internal service-to-service calls |

REST is the sensible default — universally understood and HTTP-cacheable. GraphQL earns its complexity when diverse clients need different slices of nested data (and brings its own trap: an unbounded query can fan out into an expensive backend load, so depth/complexity limits matter). gRPC shines for internal, latency-sensitive calls between services. As with everything in the framework, tie the choice to the consumers, not to fashion.

## The operational essentials

A data API lives in the application's critical path, so the cross-cutting concerns aren't optional — they're what make it production-grade:

- **Caching** — a cache (CDN for public/static responses, Redis for hot keys) in front of the store absorbs repeated reads, cutting latency and load. The hard part is invalidation when underlying data refreshes — tie cache TTLs to your freshness requirement.

- **Pagination** — never return an unbounded result set. Prefer **cursor** (keyset) pagination over `OFFSET` for large datasets, because deep offsets get pathologically slow (the same deep-pagination problem [search engines](elasticsearch-internals) hit).

- **Rate limiting** — protect the backend from abusive or runaway clients; a single misbehaving consumer shouldn't degrade the API for everyone (a sorted-set-in-Redis sliding window is a classic implementation).

- **Auth & versioning** — authenticate/authorize per consumer (API keys, OAuth) and version the API (`/v1/`) so you can evolve the contract without breaking existing clients — the same schema-evolution discipline that [streaming contracts](schema-registry-avro-protobuf) enforce.

## The trade-off: precompute vs query-on-demand

The defining design tension is **precompute vs on-demand**. Precomputing every possible answer and serving it from a key-value store gives the fastest reads, but you must compute and store combinations the user may never request, and freshness is bounded by the sync cadence. Querying on demand against an indexed store (or a fast OLAP engine) is fresher and stores less, but each request does real work, so it's harder to guarantee tight latency at high QPS.

The resolution is usually a blend keyed to the access pattern: precompute the common, expensive, high-traffic results (dashboards, leaderboards) and query on demand for the long tail and the ad-hoc. The decision is the framework's read pattern + freshness + latency requirements applied at the serving layer — there's no universal answer, only the one your numbers dictate.

A useful default I reach for: serve the **known, hot, expensive** queries from precomputed results in a fast store (so the p99 stays flat under load), and reserve on-demand querying for the genuinely dynamic or rarely-requested. It bounds your worst-case latency where traffic concentrates while keeping the system fresh and lean everywhere else. Decide what to precompute by looking at the actual request distribution, not by guessing — the traffic is usually far more skewed than teams expect.

## What to carry away

A data API is an **operational** serving problem, not an analytical one: many small, low-latency, high-concurrency reads. The cardinal rule is **don't serve it from the warehouse** — compute results in the analytical layer and serve them from a **fast store keyed for the access pattern** (the precompute-and-serve / online-store pattern). Choose the API style by consumer (**REST** default, **GraphQL** for varied nested needs, **gRPC** for internal speed), and treat **caching, pagination, rate limiting, auth, and versioning** as required, not optional. Resolve the **precompute-vs-on-demand** tension by precomputing the hot, expensive paths and querying on demand for the tail.

This completes the system-design set: the [framework](system-design-for-data-engineers), a [warehouse](designing-a-data-warehouse) (where data is computed), a [pipeline](designing-a-data-pipeline) (how it flows), and the data API (how applications consume it). The constant across all four: clarify requirements, let the access pattern drive the design, and defend every choice as a trade-off.