# Medallion Architecture on Snowflake with dbt: Bronze, Silver, Gold Done Right

The fastest way to make a Snowflake account unmaintainable is to skip the layers. Raw data lands, an analyst writes a 400-line query that cleans, joins, and aggregates all at once, that query becomes a view, three more views build on it, and within a year nobody can change anything because every transformation is entangled with every other. The **medallion architecture** — bronze, silver, gold — is the discipline that prevents this, and [dbt](analytics-engineering-dbt) on Snowflake is the cleanest way I know to implement it. But "bronze/silver/gold" gets thrown around as a magic incantation, and most teams either skip a layer or build five. This is what each layer is actually *for*, how dbt maps onto it, and the Snowflake-specific choices that make it perform.

The core idea in one line: **refine data in stages, each layer with one job and a clear contract with the next.** Bronze preserves the raw truth; silver makes it trustworthy; gold makes it useful to the business. Every rule below follows from keeping those jobs separate.

## The three layers, and what each one owes the next

```mermaid
graph LR
    SRC["Source systems"] --> ING["Snowpipe / COPY /external tables"]
    ING --> BRONZE["BRONZEraw, append-only,as-ingested + load metadata"]
    BRONZE --> SILVER["SILVERcleaned, typed, deduped,conformed entities"]
    SILVER --> GOLD["GOLDbusiness marts:dims, facts, aggregates"]
    GOLD --> BI["BI / ML / reverse ETL"]
    subgraph DBT["dbt owns silver and gold"]
        STG["sources -> staging(silver)"]
        INT["intermediate(business logic)"]
        MART["marts(gold)"]
    end
          
```

The medallion on Snowflake. Ingestion lands raw data in bronze (append-only, with load metadata so you can always trace and replay). dbt then owns the refinement: staging models clean and conform each source into silver, intermediate models hold reusable business logic, and mart models shape the gold dimensions, facts, and aggregates that BI, ML, and reverse-ETL consume. Each arrow is a contract — gold never reads bronze directly, and bronze is never edited in place.

### Bronze — the raw landing zone

Bronze is the data exactly as it arrived, plus load metadata (when it loaded, from which file/stream, a batch id). It is **append-only and never edited in place**, because its entire job is to be the replayable source of truth: if a downstream transformation has a bug, you fix the code and rebuild from bronze, never re-fetch from the source. On Snowflake, bronze is populated by Snowpipe (continuous), `COPY INTO` (batch), or external tables over a stage. In dbt terms, bronze tables are declared as **sources** — dbt doesn't build them, it reads from them and can test their freshness.

### Silver — cleaned, conformed, trustworthy

Silver is where data becomes *trustworthy*: types are cast, nulls and bad values are handled, duplicates are removed, columns are renamed to a consistent convention, and entities are conformed (one definition of "customer" across sources). In dbt this is the **staging** layer — typically one staging model per source table, a 1:1 clean-up, often materialized as views (cheap, always fresh) — plus an **intermediate** layer for reusable joins and business logic that several marts will share. Silver is the layer most analytics should actually build on; it's the trustworthy core.

### Gold — business-ready marts

Gold is shaped for consumption: dimensional models (facts and dimensions), wide aggregate tables, metrics, the things dashboards and ML features read directly. In dbt this is the **marts** layer, materialized as tables (or incremental tables) so queries are fast and don't recompute the whole lineage on every read. Gold is organized by business domain (finance marts, marketing marts), not by source — because consumers think in business terms, not in source systems.

| Layer | Job | dbt layer | Typical materialization |
| --- | --- | --- | --- |
| **Bronze** | Raw, replayable source of truth | sources (not built by dbt) | Tables loaded by Snowpipe/COPY |
| **Silver** | Clean, typed, deduped, conformed | staging + intermediate | Views (staging), ephemeral/views (intermediate) |
| **Gold** | Business marts for consumption | marts | Tables / incremental tables |

## Wiring it in dbt

The mapping is concrete. Bronze tables are declared as sources with freshness checks; staging models read sources and clean; marts read staging/intermediate and shape. Here's the shape of each piece.

```yaml
# sources.yml — bronze declared to dbt, with a freshness SLA
sources:
  - name: raw_orders
    database: BRONZE
    schema: SHOPIFY
    freshness:
      warn_after: {count: 2, period: hour}
      error_after: {count: 6, period: hour}
    tables:
      - name: orders
        loaded_at_field: _loaded_at
```

```sql
-- models/staging/stg_orders.sql  (SILVER: clean 1:1, view materialization)
{{ config(materialized='view') }}
SELECT
    id::number              AS order_id,
    customer_id::number     AS customer_id,
    lower(trim(status))     AS status,
    amount::number(18,2)    AS amount,
    refunded::number(18,2)  AS refunded_amount,
    created_at::timestamp_ntz AS ordered_at
FROM {{ source('raw_orders', 'orders') }}
WHERE id IS NOT NULL
QUALIFY row_number() OVER (PARTITION BY id ORDER BY _loaded_at DESC) = 1  -- dedup, keep latest
```

```sql
-- models/marts/finance/fct_orders.sql  (GOLD: incremental fact table)
{{ config(materialized='incremental', unique_key='order_id',
          incremental_strategy='merge', cluster_by=['ordered_at']) }}
SELECT o.order_id, o.customer_id, o.amount - o.refunded_amount AS net_amount, o.ordered_at
FROM {{ ref('stg_orders') }} o
{% if is_incremental() %}
  WHERE o.ordered_at >= (SELECT max(ordered_at) FROM {{ this }})  -- only new rows
{% endif %}
```

## Scaling: incremental models and history

Two patterns make the medallion hold up at volume. **Incremental models** (the gold fact above) avoid rebuilding a billion-row table every run — dbt processes only new/changed rows and `MERGE`s them in, which on Snowflake is the difference between a cheap nightly run and a runaway warehouse bill. **Snapshots** handle slowly changing dimensions: dbt snapshots capture how a record looked over time (SCD Type 2), so "what was this customer's tier when they placed the order" is answerable — history that silver's latest-state views deliberately don't keep.

**Test at the silver boundary, where trust is established — not just at the edges.** The highest-leverage place for dbt tests (`unique`, `not_null`, `accepted_values`, relationship tests) is the staging/silver layer, because that's where you certify the data is clean before anything builds on it. A bug caught at silver is caught before it fans out into every gold mart; a bug that slips through silver contaminates everything downstream and surfaces as a wrong dashboard. Pair that with source freshness checks on bronze, and you have the two checkpoints that catch most data incidents early: "did fresh data arrive?" (bronze freshness) and "is it clean?" (silver tests). This is the same idea as [data observability](data-observability-monte-carlo), applied at the layer boundaries where it's cheapest to enforce.

## The Snowflake-specific choices that matter

The medallion is a convention; Snowflake gives you levers that make it perform and cost less. The ones I always tune:

- **Transient tables for silver/intermediate.** Silver is rebuildable from bronze, so it doesn't need Snowflake's Fail-safe (7-day recovery). Make rebuildable layers `transient` to cut storage cost — you can always regenerate them.

- **Clustering on big gold tables.** A large fact table queried by date benefits from a clustering key (`cluster_by` in dbt config) so micro-partition pruning skips most of it; don't cluster small tables (it's wasted cost).

- **Separate warehouses per workload.** Run the dbt transformation on its own warehouse (sized for the build), separate from the BI warehouse serving gold — so a heavy nightly build never fights dashboard queries for compute.

- **Zero-copy clones for dev.** Clone the gold database instantly and for free (metadata-only) to give every developer a full-size environment without copying terabytes — one of Snowflake's genuinely killer features for a medallion workflow.

## Orchestrating the refresh: dbt, Tasks, or Dynamic Tables?

How the layers get rebuilt is a real decision in 2025, because Snowflake now offers three approaches with different trade-offs.

| Approach | How the medallion refreshes | Best when |
| --- | --- | --- |
| **dbt (scheduled)** | An orchestrator runs `dbt build` on a schedule; you own the DAG and tests | The default — full testing, docs, lineage, version control |
| **Streams + Tasks** | Snowflake Streams capture changes, Tasks transform on a schedule/trigger | Native, no external orchestrator; lower-level, less tested |
| **Dynamic Tables** | Declare the target and a target lag; Snowflake incrementally maintains it | Declarative, near-real-time silver/gold without managing the refresh logic |

My default remains **dbt** for the transformation layers, because the testing, documentation, lineage, and version control are exactly the software-engineering rigor the medallion exists to enforce. **Dynamic Tables** are increasingly compelling for parts of silver/gold that need to be near-real-time and whose logic is expressible as a single query — you declare the result and a freshness target and Snowflake maintains it incrementally, no orchestration code. A common modern pattern is dbt for the bulk of the modeling and Dynamic Tables for the few layers that must be fresh to the minute. Streams + Tasks remain the native low-level option when you can't run an external orchestrator.

## The over-engineering trap

**The medallion is a convention to bring order, not a checklist that adds value by its mere existence — and over-applying it is its own failure.** I've seen a four-table dataset wrapped in bronze, silver, gold, plus "platinum" and a staging-of-staging, where the ceremony vastly outweighed the data. Three things go wrong when teams cargo-cult it: building layers a small project doesn't need (a handful of clean source tables may not warrant a full three-tier split yet), skipping the layer that matters (loading straight from bronze into gold marts, re-entangling cleaning with business logic — the exact mess the medallion prevents), and treating layer names as magic rather than as a discipline about *separation of concerns*. The value isn't the gold labels; it's that each transformation has one job and a clear contract with the next. Apply that principle at the right scale — add layers when the entanglement they prevent is real, not before — and skip the platinum tier.

## What to carry away

The medallion architecture is separation of concerns for a data warehouse: bronze preserves the raw, replayable truth (append-only, loaded by Snowpipe/COPY, declared to dbt as sources); silver makes it trustworthy (dbt staging and intermediate models — clean, typed, deduped, conformed, and the right place to put your tests); and gold makes it useful (dbt marts — dimensions, facts, aggregates, materialized as tables or incremental tables for fast consumption). Each layer has one job and a contract with the next, and gold never reaches back to bronze.

On Snowflake, make it perform with the platform's levers: transient tables for the rebuildable silver layer, clustering on large gold facts, a dedicated transformation warehouse, and zero-copy clones for dev. Use incremental models and snapshots to scale and to keep history, and choose your refresh mechanism deliberately — dbt for tested, version-controlled transformation, Dynamic Tables for the few layers that must be near-real-time. Above all, remember the medallion is a discipline, not a magic word: apply it for the separation it buys you, at the scale your data actually warrants, and resist both skipping the silver layer and inventing a platinum one. For the alternative modeling approach on the same stack, see [Data Vault on Snowflake with dbt](data-vault-snowflake-dbt).
