# Snowflake Horizon vs Open Catalog vs AWS Glue Data Catalog: Which Governs Your Tables?

A client asked me last quarter, in one breath, whether they needed Horizon, Open Catalog, and Glue — as if picking one would make the other two go away. It's a fair question, because Snowflake and AWS have both reused the word "catalog" for services that solve genuinely different problems, and the marketing pages don't make the boundaries obvious. The honest answer for a Snowflake-plus-AWS shop with Iceberg tables is usually "some combination of all three," and the real work is understanding what each one actually governs before you wire them together — not picking a winner.

This is that boundary, drawn precisely: what Snowflake Horizon actually governs, what Snowflake Open Catalog actually is (a managed Apache Polaris service — this builds on the REST-catalog landscape covered in [the Iceberg catalog wars](iceberg-rest-catalog-wars), so I won't re-derive the spec here), what AWS Glue Data Catalog's Iceberg REST endpoint changed, how the three actually integrate, and what broke when I wired them together for real.

## What does Snowflake Horizon actually govern?

**Snowflake Horizon** (rebranded and substantially expanded from Snowflake's earlier governance features, with a major round of updates announced at Snowflake Summit 2026) is a governance and discovery layer scoped to *assets Snowflake can see* — tables, views, and increasingly external assets reachable through federation — not a general-purpose enterprise catalog that happens to run inside Snowflake. It provides object tagging, column masking and row access policies, lineage tracking, AI-powered search and discovery, and — the piece Snowflake has pushed hardest in 2026 — a "Trust Center" security posture dashboard plus governance controls specifically built for agentic workflows: policy enforcement at runtime and audit trails for what an AI agent actually did with governed data, not just what a human queried. The defining characteristic is scope: Horizon's governance model applies natively to Snowflake objects, and it extends — via catalog integrations and federation — to Iceberg tables that live outside Snowflake's own storage, including tables cataloged in Polaris or Glue, once those integrations are wired up.

## What is Snowflake Open Catalog, and how is it different from Horizon?

**Snowflake Open Catalog** is a managed service for **Apache Polaris** — the open-source, vendor-neutral Iceberg REST catalog Snowflake open-sourced. Where Horizon is scoped to what Snowflake can see, Open Catalog is explicitly built to be engine-agnostic: it implements the Iceberg REST Catalog API, so Spark, Trino, StarRocks, Flink, and Snowflake itself can all read and write the same governed Iceberg tables through one catalog, with no single engine treated as privileged. This is the "avoid re-locking your lakehouse after leaving proprietary file formats" answer covered in the [catalog wars piece](iceberg-rest-catalog-wars) — Open Catalog is Snowflake's entry in that fight, competing directly with Databricks' Unity Catalog OSS, Lakekeeper, Nessie, and Gravitino for the same "neutral, multi-engine metadata plane" role.

The two aren't really substitutes, and Snowflake's own current guidance reflects that: new customers are steered toward Horizon for Iceberg tables and multi-engine interoperability going forward, with Open Catalog's standalone sign-up path closed to net-new accounts — but that doesn't mean Polaris disappeared. Once a Horizon-to-Polaris integration is configured, Horizon's governance primitives — column masking, row access policies, tagging, sharing — apply on top of Polaris-cataloged Iceberg tables as if they were native Snowflake objects, regardless of whether the table was created by Snowflake, Flink, or Spark. In practice this means: think of Polaris/Open Catalog as the neutral, multi-engine metadata plane, and Horizon as the governance and discovery layer Snowflake projects on top of it — not two competing catalogs, but two different layers of the same stack that Snowflake is actively consolidating.

## What changed with the AWS Glue Data Catalog's Iceberg REST endpoint?

**AWS Glue Data Catalog** — AWS's long-standing central metastore, originally built as a Hive-Metastore-compatible service for Athena, EMR, and Redshift Spectrum (see the [Glue deep dive](aws-glue) for the ETL side of the service) — now exposes an **Iceberg REST endpoint** implementing the same Apache Iceberg REST Catalog specification Polaris and Snowflake speak. That single change is what makes Glue a genuine peer in this conversation rather than just "the AWS metastore": any REST-catalog-aware engine, Snowflake included, can now talk to Glue Data Catalog using the same protocol it uses to talk to Polaris. Layered on top, AWS shipped **catalog federation** (general availability, November 2025) — the ability for Glue to federate to *remote* Iceberg REST catalogs (including a Snowflake Horizon-hosted catalog) and let AWS-native engines query those remote tables without copying them, and the reverse direction works too: Snowflake can register Glue as an external catalog and query Glue-cataloged Iceberg tables directly.

```mermaid
graph TD
    subgraph SF["Snowflake"]
        HZ["Horizon(governance, discovery,masking, lineage)"]
    end
    subgraph OC["Open Catalog"]
        POL["Apache Polaris(managed Iceberg REST catalog)"]
    end
    subgraph AWS["AWS"]
        GLUE["Glue Data Catalog(Iceberg REST endpoint+ catalog federation)"]
    end
    ENGINES["Spark, Trino, Flink,StarRocks, Snowflake"]
    HZ -->|"governs tables cataloged in"| POL
    HZ -.->|"federates to / from"| GLUE
    ENGINES -->|"REST catalog protocol"| POL
    ENGINES -->|"REST catalog protocol"| GLUE
          
```

Three services, three roles: Horizon is the governance and discovery layer Snowflake projects over tables it can see; Open Catalog (Polaris) is a neutral, multi-engine REST catalog any Iceberg-aware engine can read and write; Glue Data Catalog is AWS's own REST-compatible metastore, now able to federate with a remote catalog like Horizon instead of requiring one authoritative catalog for an entire estate.

## How do you actually decide which catalog is authoritative for a given table?

The practical decision isn't "pick one catalog for the company" — it's "pick the authoritative catalog per table or per domain, based on who writes it first and who else needs to read it." A table that's primarily written and consumed inside Snowflake, with occasional Spark or Trino access, is well served by Snowflake as the native Iceberg catalog with Horizon governance directly on top — no federation needed. A table that genuinely needs multi-engine write access with no single engine as the privileged owner — a shared lakehouse layer feeding Snowflake, Databricks, and an open-source Trino cluster all at once — is the textbook case for Open Catalog/Polaris as the authoritative catalog, with each engine, Snowflake included, connecting to it as a client. A table whose lifecycle is fundamentally AWS-native — created by Glue ETL jobs, queried by Athena and EMR day to day, with Snowflake as an occasional analytical consumer — is better served by Glue Data Catalog as authoritative, with Snowflake reading it via a [catalog integration](snowflake-and-datalake-glue-iceberg-integration) rather than forcing a migration.

|  | Snowflake Horizon | Snowflake Open Catalog (Polaris) | AWS Glue Data Catalog |
| --- | --- | --- | --- |
| **What it is** | Governance + discovery layer | Managed, vendor-neutral Iceberg REST catalog | AWS's metastore, now Iceberg-REST-compatible |
| **Scope** | Assets Snowflake can see (native + federated) | Any REST-catalog-aware engine | AWS-native services + federated remote catalogs |
| **Best authoritative fit** | Snowflake-centric tables | Genuinely multi-engine shared tables | AWS-native ETL/Athena/EMR-centric tables |
| **Governance primitives** | Masking, row access, tagging, lineage, AI guardrails | Access control at the catalog level (via Horizon integration for richer policy) | IAM-based; Lake Formation for finer-grained AWS-side policy |

**The trap isn't picking the "wrong" catalog — it's ending up with the same table registered as authoritative in two places at once, with no single source of truth for who can commit a schema change.** I've seen a table get created in Glue by an ETL job, then separately registered as an unmanaged Iceberg table in Snowflake by a different team who didn't know the Glue registration existed — two catalogs, two independent views of "current" metadata, and a schema evolution that landed in Glue silently broke Snowflake's cached table definition until a manual refresh caught up. Before wiring federation in either direction, agree explicitly on which catalog owns write access for each table, and treat every other catalog touching that table as read-only via federation — never assume federation alone prevents a second writer from showing up.

## What's the actual integration lesson learned from running all three together?

Refresh latency is the operational detail that catches teams by surprise. When Snowflake reads an Iceberg table through an external catalog integration (Glue or Polaris) rather than owning it natively, table metadata isn't instantly consistent — Snowflake has to refresh its view of the external catalog's current snapshot, and depending on refresh configuration and query patterns, a query can see stale metadata for longer than expected after an external write. Snowflake's own guidance is to configure frequent refreshes specifically to avoid this, and it's worth treating as a first-class design decision, not a default to accept — the same lesson that showed up when reading the FHIR real-time pipeline: [freshness has to be an explicit requirement](fhir-streaming-snowpipe-dynamic-tables), not an assumption, whenever data crosses a system boundary. The second lesson is IAM scoping: Snowflake's documented best practice for the Glue catalog integration is a dedicated IAM policy and role scoped specifically to catalog read access, not reuse of a broader Glue or S3 role that was already lying around — a broad role works in testing and becomes an access-review headache the moment an audit asks "why does Snowflake's Glue integration also have write access to unrelated buckets."

## What to carry away

Horizon, Open Catalog, and Glue Data Catalog aren't three competitors for the same job — they're a governance layer (Horizon), a neutral multi-engine metadata plane (Open Catalog/Polaris), and AWS's own increasingly REST-compatible metastore (Glue), and a real Snowflake-plus-AWS estate typically runs some combination of all three rather than picking a single winner. Decide the authoritative catalog per table based on who writes it and who else needs multi-engine access, not as a one-time company-wide platform decision, and treat every non-authoritative catalog touching that table as read-only via federation.

The operational risks that actually bite are boring and specific: two catalogs each believing they own write access to the same table, and external-catalog refresh latency quietly serving stale metadata after Snowflake stops being the first writer. Both are solvable with explicit ownership agreements and deliberate refresh configuration — neither is solved by picking "the right" catalog, because in a mixed Snowflake-plus-AWS estate, there usually isn't just one.
