Row- and Column-Level Security in the Lakehouse: RBAC, ABAC, and the Performance Cost

The request sounds simple: "sales reps should only see their own region's deals." Then someone asks the follow-up that turns it into an architecture decision — does that rule live in a view, in the platform's native security layer, or in a separate policy engine sitting in front of three different query engines? I've implemented this three different ways across three platforms, and the thing that surprised me most wasn't the syntax differences. It's that the same logical rule can be nearly free or genuinely expensive depending entirely on whether the engine can push the filter down before it scans data, and most teams don't find that out until a security review forces a query plan in front of someone.

This is the architecture of row- and column-level access control: the RBAC-versus-ABAC decision underneath it, how Unity Catalog, Snowflake, and Apache Ranger each implement it, and the performance trap that's specific to this kind of security, not security in general.

RBAC versus ABAC: what's actually different?

Role-based access control (RBAC) grants permissions to roles, and users get permissions by being assigned a role — "the ANALYST role can read sales.orders." It's simple to reason about and it's where every platform starts. The crack appears at scale: row- and column-level rules rarely map cleanly onto roles. "See only your region" isn't a role, it's a property of the user combined with a property of the data, and modeling that in pure RBAC means either a role explosion (a role per region, per department, per combination of the two) or hand-written per-table view logic that doesn't generalize.

Attribute-based access control (ABAC) evaluates a policy against attributes — a user's department, a row's classification tag, a column's sensitivity label — at query time, rather than checking role membership. The same policy ("hide rows where row.region != user.region") applies automatically to every table tagged with a region attribute, present and future, without writing per-table logic. This is the architectural shift the major platforms have been making: RBAC for coarse object-level grants (can this role touch this schema at all), ABAC for the fine-grained, scales-by-itself layer on top.

How does Unity Catalog implement row and column security?

Unity Catalog gives you two related mechanisms, and knowing when to reach for which is the actual skill. Row filters and column masks are per-table SQL user-defined functions: a row filter is a UDF evaluated against each row at query time to decide visibility, a column mask is a UDF that transforms a column's value before it's returned. They're attached directly to a specific table, which makes them precise but means you're writing and maintaining that logic table by table.

The newer, more scalable layer is ABAC policies, attached at the catalog or schema level and applied automatically to every table and column carrying a matching governed tag — tag a column pii_email once, and every table with a column carrying that tag inherits the masking policy without anyone touching that table's definition. This is exactly the tag-driven scaling pattern that classification-based PII protection needs — Databricks' own guidance is to use ABAC governed-tag policies to centralize and scale access control, and to reach for per-table row filters and column masks only when you need table-specific logic ABAC doesn't cover, or haven't migrated to ABAC yet.

-- Unity Catalog: a row filter UDF, attached to a specific table.
-- Precise, but has to be wired to each table individually.
CREATE FUNCTION region_filter(region STRING)
RETURNS BOOLEAN
RETURN region = current_user_region() OR is_account_group_member('admins');

ALTER TABLE sales.orders SET ROW FILTER region_filter ON (region);

-- The ABAC alternative: tag the column once, attach a policy to the tag
-- at the catalog level, and every current and future table inherits it.
CREATE POLICY mask_region_pii
ON CATALOG sales_catalog
COLUMN MASK WHEN governed_tag('classification') = 'pii'
USING redact_pii(column_value);

How does Snowflake's row access and masking policy model work?

Snowflake separates the two concerns into distinct object types with a defined evaluation order. A row access policy is a schema-level object containing an expression that determines which rows are visible in a given query context — it can reference session variables, mapping tables, or any conditional logic, and it attaches to a table or view. A masking policy operates purely at the column level — it's a SQL expression that takes the raw value and returns either the real value or a masked one, evaluated per query based on the requesting role. When both are present on the same object, Snowflake evaluates the row access policy first, then applies any masking policies on the remaining visible rows — and a given column can be governed by a row access policy or a masking policy, not both at once, which is a real constraint worth knowing before you design around it.

Snowflake's scaling answer to the same per-object-wiring problem is tag-based masking policies: attach a masking policy to a tag rather than a column, and any column carrying that tag — including columns added after the policy was created, thanks to tag inheritance — is automatically protected. A masking policy assigned directly to a specific column still takes precedence over a tag-based one if both exist, which is the escape hatch for the rare table that needs different handling than its peers.

graph TD
    Q["Query arrives"]
    RAP["Row access policy
evaluated first
(which rows are visible)"]
    MP["Masking policy
evaluated second
(transform visible column values)"]
    TAG["Tag-based masking
(policy on a tag,
inherited by every tagged column)"]
    OUT["Result set returned
to the requesting role"]
    Q --> RAP --> MP --> OUT
    TAG -.->|"governs columns via inheritance"| MP

Snowflake's evaluation order for a table with both policy types attached. Row access policies run first and narrow the row set; masking policies then transform values in the columns that remain visible. Tag-based masking policies are the scaling mechanism — attach once to a tag, and every column carrying that tag (now or later) inherits the same protection without per-column wiring.

How does Apache Ranger handle this across Trino and Hive?

Ranger takes a different architectural shape entirely: instead of policies living inside each engine, Ranger Admin is a centralized policy store, and a lightweight plugin inside each engine (Trino, Hive, and others) pulls the relevant policies down and caches them locally — typically on disk as JSON in a policy cache — so enforcement happens engine-side without a network round-trip per query. This is the right architecture for the specific problem Ranger solves: a heterogeneous estate where Trino is federating queries across multiple underlying systems, and you want one place to define "who can see what" that's enforced consistently regardless of which engine actually runs the query. Ranger supports the same row-filter and column-masking concepts as Unity Catalog and Snowflake, plus resource-based, role-based, and attribute-based policy types in a single framework, with centralized audit logging across every engine the plugin is deployed to — which is the feature that makes it attractive for compliance-heavy, multi-engine estates even though it's another system to operate on top of the engines themselves.

Why is the performance cost specific to this kind of security, not security generally?

Because a row filter or masking policy is a predicate or transformation injected into the query plan, and the query optimizer's ability to push that predicate down to the storage layer determines whether the policy costs almost nothing or costs a full unfiltered scan. A simple, sargable row filter (region = current_user_region(), where the function is deterministic and resolvable before the scan) can usually be pushed down alongside the query's own WHERE clause, pruning partitions and files exactly like a normal predicate would. A row filter built on a non-deterministic function, a join against a mapping table, or session state the optimizer can't reason about ahead of time often can't be pushed down — the engine ends up scanning everything and filtering afterward, which on a large partitioned table is the difference between milliseconds and minutes.

I've seen a "simple" row-level security rollout 4x the runtime of a dashboard's core query, and the cause was never the rule itself — it was a row filter implemented as a join against a large, ungoverned mapping table that broke partition pruning on every query touching the secured table. Before shipping a row filter or column mask, look at the actual query plan with and without the policy applied, on a representative data volume, not a small test table where the cost is invisible. Keep the predicate as simple and deterministic as the use case allows, materialize and pre-aggregate the mapping table if a join is unavoidable, and budget real performance-testing time for any row-level security rollout on a table that anyone's dashboard depends on — this is the one class of security control where "it works correctly" and "it works acceptably" are genuinely separate questions.

How does this relate to tokenization and masking at the data layer?

PII tokenization and masking protect the value — the actual data is transformed (tokenized, redacted) regardless of who's asking, often as early as ingestion. Row- and column-level security protects the access path — the same underlying value might be fully visible to one role and masked or hidden entirely to another, decided dynamically at query time. They're complementary, not competing: a mature setup tokenizes the worst identifiers at ingestion so they're never stored in the clear at all, and layers query-time row/column security on top for everything else, driven by the same classification tags so both controls scale together instead of being maintained as two separate systems that inevitably drift out of sync.

What to carry away

RBAC handles coarse, object-level grants well; row- and column-level rules are fundamentally attribute-based, and every major platform has converged on ABAC — governed tags in Unity Catalog, tag-based masking in Snowflake, attribute policies in Ranger — as the way to make fine-grained security scale past hand-wiring every table. Snowflake's row access and masking policies run in a defined order (row filtering first, masking second); Unity Catalog separates per-table row filters/column masks from catalog-wide ABAC policies; Ranger centralizes policy administration outside the engines and pushes cached policies down to plugins in Trino and Hive for consistent multi-engine enforcement.

The trap unique to this kind of security is performance, not correctness — a row filter that defeats predicate pushdown turns a cheap, partition-pruned query into a full scan, and it's invisible until someone runs it against real data volume. Test the query plan before you ship the policy, not after a dashboard owner files a ticket. And treat row/column security and value-level tokenization as two halves of one system driven by the same classification tags, not two separate efforts that quietly diverge.