The Metrics Layer: Headless BI and One Definition of Revenue

Every data leader has lived this meeting. Finance says revenue was $4.2M. The product dashboard says $4.4M. Marketing's spreadsheet says $4.1M. Everyone pulled from "the warehouse," everyone is convinced they're right, and the next forty-five minutes are spent not making a decision but litigating whose SQL is correct. The numbers differ because "revenue" was defined three times, in three tools, by three people who each made a slightly different call about refunds, currency, and which date to recognize on. This is the problem the metrics layer set out to kill, and 2022 was the year it went from a blog-post idea to real products.

A metrics layer — also called a semantic layer or headless BI — is a central place where you define each business metric exactly once, in code, and from which every downstream consumer (BI tools, notebooks, spreadsheets, apps) gets the same answer. Define "active users" once; everyone who asks gets that definition. It sounds almost too obvious to need a product. The reason it does is more interesting than it looks.

Why the same metric comes out different

The root cause is where metric logic lives. In the classic modern data stack, your dbt models produce clean tables — orders, users, sessions — but the actual metric (revenue = sum of order totals, minus refunds, in USD, recognized on ship date) is computed downstream, in whatever tool asks the question. Tableau computes it one way. A Looker explore computes it another. An analyst's ad-hoc query a third. The definition is scattered across every consumer, and they drift apart the instant anyone makes a different reasonable choice.

You can't fix this by being more careful, because the logic has nowhere central to live. The fix is structural: pull metric definitions out of the BI tools and into a shared layer that sits between your modeled tables and every consumer. That's the whole move.

graph TD
    WH[("Warehouse tables
(dbt models)")] subgraph BEFORE["Without a metrics layer"] T1["Tableau
defines revenue"] T2["Looker
defines revenue"] T3["Notebook
defines revenue"] end subgraph AFTER["With a metrics layer"] ML["Metrics layer
revenue defined ONCE"] C1["Tableau"] C2["Notebook"] C3["Internal app"] end WH --> T1 WH --> T2 WH --> T3 WH --> ML ML --> C1 ML --> C2 ML --> C3

Left: each tool re-implements "revenue" against raw tables, so three tools yield three numbers. Right: the metric is defined once in the layer, and every consumer queries the layer instead of writing its own aggregation — so there is exactly one revenue, by construction. "Headless" names the right side: the definitions (the body) are decoupled from any particular BI tool (the head), so you can swap or add front-ends without redefining anything.

What you actually define

A metrics layer asks you to declare metrics declaratively rather than as frozen SQL. The key realization is that a metric isn't a single query — it's a recipe the layer compiles into the right SQL for whatever question you ask. You define the ingredients:

  • Measures / aggregations: the core math — sum(order_total), count(distinct user_id).
  • Dimensions: the ways you're allowed to slice it — by day, region, plan, channel.
  • Joins and entities: how the underlying tables relate, so the layer can assemble the right ones on demand.
  • Filters and time grain: the standard exclusions (test accounts, refunds) and the time semantics baked in once.

Then a consumer asks "weekly revenue by region for Q3," and the layer generates the SQL — picking the right tables, applying the canonical filters, aggregating at the asked-for grain. You never hand-write that query, and crucially, neither does the next tool. Here's the shape of a definition in the dbt-metrics style that launched in 2022:

metrics:
  - name: revenue
    label: Revenue (net, USD)
    model: ref('fct_orders')
    calculation_method: sum
    expression: order_total_usd - refund_total_usd
    timestamp: shipped_at          # revenue is recognized on ship date
    time_grains: [day, week, month, quarter]
    dimensions: [region, plan, channel]
    filters:
      - field: is_test_account
        operator: 'is'
        value: 'false'

That block is the company's definition of revenue. There's no second one to disagree with.

The 2022 landscape

The idea had been circulating since 2020–2021 (Benn Stancil's "missing piece of the modern data stack" essay and Airbnb's writeups of its internal Minerva metrics platform lit the fuse), and in 2022 it became a real product category with several distinct bets.

ApproachWhat it isAngle in 2022
dbt Semantic LayerMetrics defined alongside dbt models, served via a proxy that compiles metric queries to warehouse SQLAnnounced at Coalesce in October 2022, built on the new dbt metrics spec — metrics living next to the transformations that feed them
Cube (Cube Dev)An open-source, standalone semantic layer / API with caching, popularized the term "headless BI"Tool-agnostic API (SQL, REST, GraphQL) in front of any warehouse; strong for embedding metrics in apps
MetricFlow (Transform)An open-source framework for defining metrics and generating SQL, with sophisticated join/grain handlingTransform's engine, open-sourced in 2022; notable for handling complex multi-hop joins automatically
LookML (Looker)The original, proven semantic model — but coupled to Looker as the consumerThe existence proof that this works; the catch is the definitions only served Looker, not "headless"

LookML is worth dwelling on, because it had quietly solved the metric-consistency problem years earlier — inside Looker. Every Looker query went through one LookML model, so Looker users never had the three-numbers fight. The 2022 movement's real ambition was to take that proven idea and make it headless — decoupled from any single BI tool — so the same definitions could serve Tableau, a Python notebook, a spreadsheet, and an internal app at once. That decoupling is the new part.

The hard parts nobody puts on the slide

I was excited about the metrics layer in 2022, and I still am — but the launch energy glossed over real difficulty, and pretending otherwise sets teams up to stall.

Dynamic SQL generation is genuinely hard, and that's where the bodies are buried. The seductive promise is "define once, query any slice." But compiling a metric definition plus an arbitrary dimensional request into correct, performant SQL — across many-to-one joins, fan-out traps that silently double-count, mixed time grains, and semi-additive measures (a bank balance can't be summed across time the way revenue can) — is a hard query-planning problem. The naive cases demo beautifully; the real ones (multi-hop joins, non-additive metrics) are where early metrics layers either produced wrong numbers or fell back to "just write SQL." Evaluate any metrics layer on your gnarliest metric, not the revenue demo.

Two more frictions that decided real adoptions. First, the consumers have to actually use it. A metrics layer only delivers consistency if Tableau and the notebooks query through it instead of hitting tables directly — and in 2022 the integrations were young, so a tool that couldn't speak to the layer just kept defining its own revenue, defeating the point. Second, performance and caching. Adding a layer that generates SQL on the fly can add latency; the mature implementations lean hard on caching and pre-aggregation (this is partly why an in-memory engine like Power BI's VertiPaq felt so fast — it pre-aggregated). A metrics layer without a caching story is a tax on every query.

Why it mattered beyond tidy dashboards

Consistency is the obvious win, but the durable reason to care is governance and trust. When metrics live in one version-controlled place, you get the software-engineering virtues the rest of the stack already had: a metric change goes through review, you can see its history, you can test it, and you can trace exactly which definition produced a number. "Where did this figure come from?" stops being an archaeology project.

There's also a forward-looking reason that reads as prescient now: a machine-readable, central definition of every metric is exactly what you need to let non-SQL interfaces — and, soon, natural-language and AI querying tools — return trustworthy numbers. If an assistant generates raw SQL against your tables, it'll reinvent "revenue" and probably get it wrong; if it queries a governed metrics layer, it inherits the one correct definition. In 2022 that was a footnote. It aged well.

What to carry away

The metrics layer fixes the three-different-revenue-numbers problem at its root: it moves metric definitions out of individual BI tools and into one central, version-controlled, code-defined layer that compiles each request into SQL on demand — so every consumer gets the same answer by construction. "Headless BI" is the same idea named for its key property: definitions decoupled from any one front-end, served to all of them.

In 2022 the contenders — dbt's Semantic Layer, Cube, MetricFlow, and the proven-but-coupled LookML — agreed on the goal and differed on the approach. The promise is real and the governance payoff is large, but be sober about the hard part: generating correct, fast SQL across complex joins and non-additive metrics is a genuine engineering problem, and the layer only helps if your tools actually query through it. Define revenue once — but pressure-test the layer on the metric that's actually hard to compute, not the one that demos well.