The Iceberg REST Catalog and the Catalog Wars: Polaris, Unity, Lakekeeper

The lakehouse made a promise: your data lives in open formats on cheap object storage, and any engine can read it, so you're never locked in again. It was true — and incomplete. Because there's a piece nobody talks about at the storage layer that quietly holds all the power: the catalog. It's the service that knows which tables exist, where each one's current metadata lives, and — critically — that brokers the atomic commit when you write. Open the file format all you like; if the catalog is proprietary, your lakehouse is still locked to a vendor. That realization is what set off the catalog wars, and the peace treaty everyone is now fighting to control is the Iceberg REST Catalog spec.

Here's the through-line: open table formats decoupled the engine from the storage. The Iceberg REST Catalog aims to decouple the engine from the catalog — the same unlock, one layer up. Whoever owns the catalog owns governance, interop, and the real lock-in. So Snowflake, Databricks, AWS, and a wave of open-source projects all rushed in. This is the map.

What a catalog actually does

It's worth being precise, because "catalog" gets used for three different things. In the lakehouse sense, a catalog is the metadata service that, for each table, tracks its existence in a namespace and — the load-bearing job — holds the pointer to the current metadata file. An Iceberg table is a tree of metadata and data files; "the current state of the table" is whichever root metadata file the catalog says is current. When you commit a write, you're asking the catalog to atomically swap that pointer from the old metadata to the new — and to reject your commit if someone else swapped it first. That atomic compare-and-swap is what gives a lake table its ACID guarantee. No catalog, no safe concurrent writes.

So the catalog isn't a directory you browse. It's the transaction coordinator and the source of truth for "what is the table right now." For years that role was filled by the aging Hive Metastore; then every platform built its own — AWS Glue, Snowflake's internal catalog, Databricks' Unity Catalog — and each one tied your tables to that platform. The catalog became the lock-in that the open format was supposed to kill.

The Iceberg REST Catalog: a standard API

The fix is gloriously boring: define a standard REST API that any catalog can implement and any engine can speak. That's the Iceberg REST Catalog specification — a documented HTTP interface for creating namespaces and tables, loading a table's current metadata, and committing updates (the atomic swap). An engine that speaks Iceberg REST can talk to any compliant catalog without a custom plugin, and a catalog that implements it can serve any compliant engine.

graph TD
    subgraph ENGINES["Engines (speak Iceberg REST)"]
        E1["Spark"]
        E2["Trino"]
        E3["Snowflake"]
        E4["DuckDB / Flink / ..."]
    end
    REST["Iceberg REST Catalog API
(the standard interface)"]
    subgraph CATALOGS["Any compliant catalog implements it"]
        C1["Apache Polaris"]
        C2["Unity Catalog OSS"]
        C3["Lakekeeper / Nessie / Gravitino"]
        C4["AWS S3 Tables / Glue"]
    end
    FMT["Table metadata (Iceberg / Delta)"]
    STORE[("Object storage — Parquet data files")]
    E1 --> REST
    E2 --> REST
    E3 --> REST
    E4 --> REST
    REST --> C1
    REST --> C2
    REST --> C3
    REST --> C4
    CATALOGS --> FMT --> STORE

The interoperability the catalog wars are really about. The Iceberg REST API is a narrow waist: engines on top speak one protocol, catalogs underneath implement one protocol, and the two sides mix and match freely. This is the same architectural move that open table formats made for storage — standardize the interface so no single vendor owns the layer. The fight is over which implementation becomes the default, because the catalog is where governance and lock-in now live.

The combatants

By 2026 the field has sorted into a few serious contenders, each with a different origin and agenda.

Catalog	Origin	Angle
Apache Polaris	Created by Snowflake, donated to the ASF (incubating)	Open-source Iceberg REST catalog with role-based access control; Snowflake's bid for a neutral, governed standard
Unity Catalog OSS	Open-sourced by Databricks (2024)	Multi-format (Iceberg and Delta), multi-asset (tables, ML models, functions); governance-first, speaks Iceberg REST
Lakekeeper	Independent open source (Rust)	Lightweight, fast, spec-faithful Iceberg REST catalog — the "just the catalog, no platform" option
Nessie	Project Nessie (Dremio)	Git-like catalog — branches, tags, and commits across tables, for multi-table transactions and experimentation
Apache Gravitino	Apache project	Federated metadata across catalogs and sources — a catalog of catalogs
AWS S3 Tables / Glue	AWS	Managed Iceberg storage and catalog with a REST endpoint — the cloud-native default on AWS

The two that matter most strategically are Polaris and Unity Catalog OSS, because they came from the two companies that defined the modern lakehouse. Snowflake open-sourcing Polaris and Databricks open-sourcing Unity Catalog within months of each other was not a coincidence — it was both giants recognizing that the catalog had become the contested ground, and that being the open standard catalog is more valuable than owning a proprietary one nobody trusts. Unity Catalog's twist is that it governs Delta and Iceberg together and manages more than tables; Polaris is more squarely an Iceberg-native REST catalog. Both speaking Iceberg REST is the détente that makes the whole ecosystem composable.

Why this is the same story as Delta-vs-Iceberg, resolved

For a while the format war (Delta vs Iceberg) looked like the main event. It's largely de-escalated: Delta's UniForm exposes Delta tables with Iceberg-compatible metadata, engines increasingly read both, and the action moved up to the catalog. The catalog is where the two worlds actually meet — a governance layer that can present Delta and Iceberg tables through one interface (as Unity Catalog does) makes the format underneath an implementation detail. The catalog won the right to be the thing that matters because it's where governance lives: identity, permissions, audit, lineage, data sharing. Format is plumbing; governance is policy, and policy is what enterprises actually fight over.

Configuring an engine against a REST catalog is, fittingly, mundane — point it at a URI and authenticate:

; Spark talking to any Iceberg REST catalog — the catalog is just a URI now
spark.sql.catalog.lake                = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.lake.type           = rest
spark.sql.catalog.lake.uri            = https://polaris.example.com/api/catalog
spark.sql.catalog.lake.warehouse      = prod_lakehouse
spark.sql.catalog.lake.credential     = ${OAUTH_CLIENT_CREDENTIAL}
; swap the uri for Unity Catalog OSS / Lakekeeper / S3 Tables — the engine doesn't care

"Open catalog" is not yet "fully interchangeable" — read support runs ahead of write and governance. The Iceberg REST read path (load a table, scan it) is mature and broadly interoperable. The harder parts lag: write commits, credential vending (how the catalog hands engines short-lived storage credentials), and especially the governance model — row/column masking, fine-grained access, lineage — are where catalogs differ and where the spec is thinnest. So two "Iceberg REST" catalogs can both pass a basic read test and still behave very differently when you need governed writes from multiple engines. Don't assume you can swap catalogs as freely as you swap engines yet; pilot your actual write-and-govern workflow, not just a SELECT. The standard is real and improving fast, but the deep governance features are exactly where vendors still differentiate (and re-introduce lock-in).

How to choose

The one rule that ages well: for any new lakehouse, require an Iceberg REST-compliant catalog. That single constraint preserves your optionality — your tables stay reachable by Spark, Trino, Snowflake, DuckDB, Flink, and whatever comes next, and you can change catalogs later without rewriting data. Beyond that: pick Unity Catalog OSS if you want unified governance across Delta and Iceberg and assets beyond tables; Polaris if you want a neutral, Iceberg-native, ASF-governed catalog; Lakekeeper if you want a lean, fast catalog and nothing else; Nessie if Git-style branching of data is central to how you work; and the cloud-managed option (S3 Tables/Glue) if you're all-in on one cloud and want zero ops. What you should not do in 2026 is adopt a catalog that only speaks a proprietary API — that's signing up for the exact lock-in the open format was meant to end.

What to carry away

The catalog is the metadata service that tracks your lakehouse tables and brokers the atomic commit that gives them ACID — which makes it the real source of truth and, historically, the real lock-in, even when the storage and file format were open. The Iceberg REST Catalog spec fixes that by standardizing the catalog API the way open table formats standardized storage: any compliant engine talks to any compliant catalog. That standard is why the catalog wars exist — Apache Polaris (Snowflake), Unity Catalog OSS (Databricks), Lakekeeper, Nessie, Gravitino, and the cloud catalogs are all fighting to be the default implementation of the layer where governance now lives.

Treat it as settled in one respect and unsettled in another. Settled: require Iceberg REST compliance for any new lakehouse, and the format-war anxiety (Delta vs Iceberg) is mostly behind us because the catalog mediates both. Unsettled: write semantics, credential vending, and fine-grained governance still differ between catalogs, so "open" doesn't yet mean "swap freely" for governed multi-engine writes. Choose the catalog by your governance needs — Unity OSS for multi-format governance, Polaris for a neutral Iceberg-native standard, Lakekeeper for lean simplicity — but make REST compliance non-negotiable, because the catalog is the one layer where lock-in is quietly trying to come back.