# Databricks LTAP: One Copy of Data for Transactions and Analytics

At Data + AI Summit on June 16, 2026, Databricks announced **LTAP — Lake Transactional/Analytical Processing** — and the claim is bold enough to be worth slowing down for: transactions and analytics on a *single copy* of data in the lake, with no ETL pipeline between them. If that sounds like the holy grail the database industry has chased for two decades, it's because it is. The dream of one system that serves both your operational writes and your analytical scans — without copying data from one store to another — has a long graveyard of attempts behind it. So my reaction to LTAP was equal parts "finally, the lakehouse logic extended to the operational tier" and "okay, but how do you actually beat the physics that killed everyone else?" This is a look at what LTAP is, how it's built on Lakebase, and an honest read on the hard part.

Quick grounding before the architecture: LTAP is built on **Lakebase**, Databricks' serverless Postgres on open object storage (launched in 2025, now reportedly serving thousands of customers and handling on the order of 12 million database launches per day). LTAP is the architecture that fuses that operational Postgres layer with the analytical [Lakehouse](lakehouse-architecture-delta-lake) on one storage layer. As of the announcement it's "coming soon as part of Lakebase" — so this is an architecture to understand now, not yet a GA product to benchmark.

## The problem: the OLTP/OLAP split and the ETL tax

For as long as I've built data platforms, operational and analytical data have lived in separate worlds. Your application writes to an **OLTP** database — Postgres, MySQL — optimized for fast, transactional, row-level reads and writes. Your analytics run on an **OLAP** system — a warehouse or lakehouse — optimized for scanning huge columnar datasets. They're tuned for opposite access patterns, so they're separate systems, and you connect them with a pipeline: nightly ETL, or [CDC](debezium-cdc) streaming changes from the operational store into the analytical one.

That pipeline is a tax you pay forever. It adds latency (analytics are always behind the operational truth by the pipeline's lag), it's a copy of the data to store and reconcile, it's infrastructure to operate and a thing that breaks at 2am, and it's the seam where the two systems silently drift out of sync. Databricks' framing — and Ali Ghodsi's quote, "the infrastructure that powered the last era of computing is now the bottleneck that no one can afford" — names this directly. The pipeline between operational and analytical data is the thing LTAP sets out to delete.

And there's a 2026 reason this suddenly matters more: **AI agents**. An agent that reads the current state of the business and then acts on it needs operational freshness *and* analytical context in the same place. The pipeline lag that analysts tolerated is a real handicap for an agent making a decision now, on data that's an hour stale.

## How LTAP works

The architecture rests on the same move the lakehouse made, pushed one tier further down. The lakehouse put analytics directly on open files in object storage by adding a transaction log. LTAP extends that so the *operational* data lives there too — and then runs two different compute engines against the one copy.

```mermaid
graph TD
    subgraph OLD["The old world: two systems + a pipeline"]
        APP1["App writes"] --> OLTP1[("OLTP database(row store)")]
        OLTP1 -->|"ETL / CDC pipeline(lag, copy, breakage)"| OLAP1[("OLAP warehouse(columnar copy)")]
        OLAP1 --> BI1["Analytics / BI"]
    end
    subgraph NEW["LTAP: one copy, two engines"]
        APP2["App writes"] --> PG["Lakebase (Postgres)transactional compute, full ACID"]
        PG --> LAKE[("One copy in the lakeopen object storageDelta + Iceberg")]
        LAKE --> ANALYTICS["Lakehouse engineanalytical compute, any concurrency"]
        UC["Unity Catalogone identity / permissions / audit"] -.-> PG
        UC -.-> ANALYTICS
    end
          
```

The shift. The old world keeps two physical copies of the data — a row-oriented OLTP store and a columnar OLAP copy — joined by an ETL/CDC pipeline that adds lag and breakage. LTAP keeps a single copy in open formats (Delta + Iceberg) in object storage; a Postgres engine serves transactions against it with full ACID, while a separate Lakehouse engine serves analytics at any concurrency. Both read the same bytes, governed once by Unity Catalog — so there is no pipeline, no replica, and nothing to drift.

The load-bearing design decisions, as announced:

- **One copy, open formats.** All operational, analytical, and streaming data sit on open object storage in [Delta and Iceberg](open-table-formats). Postgres-native transactional data is stored in those formats *from the point of write* — not converted later by a pipeline.

- **Separate compute, shared storage.** Transactions run in standard Postgres with full ACID semantics; analytics scale across the full Lakehouse at any concurrency. Each workload scales independently with no data movement between systems — so the OLTP side and the OLAP side don't fight for the same resources (the classic HTAP interference problem).

- **One governance plane.** [Unity Catalog](unity-catalog) provides a single identity, permission, and audit model across both engines, since they read the same data.

Concretely, the promise is that the same table your app transacts against is the same table your analysts and agents query — no `orders` table in Postgres *and* a copied `orders` table in the warehouse, just one:

```sql
-- the application's transactional write (Lakebase / Postgres, ACID)
BEGIN;
UPDATE orders SET status = 'paid', paid_at = now() WHERE id = 84217;
INSERT INTO order_events (order_id, kind) VALUES (84217, 'payment_captured');
COMMIT;

-- analytics over the SAME copy, seconds later, no ETL in between
SELECT date_trunc('hour', paid_at) AS hr, count(*), sum(amount)
FROM orders                       -- not a replica — the same table
WHERE status = 'paid'
GROUP BY 1 ORDER BY 1;
```

Databricks also announced new Lakebase capabilities alongside LTAP that lean into its lake-native foundation: cross-cloud and cross-region disaster recovery, **Git-style branching and snapshots** of the database (branch production data to experiment safely, the way you branch code), and autonomous database operations — health monitoring, slowdown detection, and index proposals. Branching a live operational database is the kind of thing only a copy-on-write, lake-backed store can do cheaply, and it's a genuinely novel capability.

## The hard part: one physical layout, two opposite access patterns

**HTAP is a graveyard, and the reason is physics — judge LTAP on how it answers this, not on the press release.** OLTP and OLAP don't just want different software; they want opposite *physical* data layouts. Transactions want row-oriented, write-optimized storage with millisecond point lookups and updates. Analytics want columnar, compressed, scan-optimized storage. You cannot make one physical layout optimal for both — that tension is exactly what split the two worlds in the first place, and it's where SAP HANA, SingleStore, TiDB, and others spent enormous engineering. So the question that decides whether LTAP is revolutionary or marketing is: *how does storing Postgres-native transactional data in columnar Delta/Iceberg "from the point of write" deliver OLTP write latency and point-read performance?* Columnar files are hostile to single-row updates. The plausible answer is a row-oriented/log write tier that serves transactions and is continuously, transparently organized into columnar lake files for analytics — but until it's GA and independently benchmarked, treat the "no tradeoffs" claim as the thing to verify, not assume.

I want to be fair: the lakehouse's own transaction log already solved a smaller version of this, giving ACID on object storage that people doubted was possible. LTAP extending that to genuine OLTP latencies is a harder problem, but not an obviously impossible one — and "separate compute engines over one copy" is a smarter framing than the old HTAP approach of one engine trying to be good at both. The architecture is sound in principle. The proof is in the write-latency and point-read numbers, which weren't part of an announcement of something "coming soon."

## How it compares

|  | Classic OLTP + OLAP + ETL | Snowflake Unistore (Hybrid Tables) | Databricks LTAP |
| --- | --- | --- | --- |
| Copies of data | Two (operational + analytical) | Managed within Snowflake | One, in the open lake |
| Pipeline between them | ETL / CDC (lag, breakage) | None (in-platform) | None (by design) |
| Storage format | Row store + columnar copy | Proprietary | Open (Delta + Iceberg) |
| Transactional engine | Postgres / MySQL etc. | Snowflake hybrid tables | Postgres (Lakebase) |
| Governance | Per-system, fragmented | Snowflake | Unity Catalog (unified) |
| Status (mid-2026) | The status quo | Generally available | Announced, "coming soon" |

The competitive context matters: Databricks isn't first to pitch unifying OLTP and OLAP — Snowflake's Unistore made a similar promise. LTAP's distinctive bet is doing it on **one copy in open formats** (Delta/Iceberg) rather than inside a proprietary store. If the open-format claim holds up under OLTP workloads, that's a real differentiator, because it means your operational data isn't locked in — it's queryable by anything that reads Delta or Iceberg.

**Why this is really an AI-agent story.** The most interesting use case isn't faster dashboards — it's agents. An autonomous agent that reads the current operational state and then takes a transactional action needs both halves in one consistent place: fresh writes *and* analytical context, no pipeline lag between "what's true now" and "what I can analyze." LTAP's pitch lands hardest there — a single governed store an agent can both query for context and transact against, without the staleness gap that a CDC pipeline bakes in. Watch this less as a warehouse feature and more as operational substrate for agentic systems.

## What to carry away

LTAP — Lake Transactional/Analytical Processing — is Databricks' move to collapse the decades-old split between operational and analytical systems onto a single copy of data in the open lake, eliminating the ETL pipeline that has always connected them. It's built on Lakebase (serverless Postgres) for the transactional side and the Lakehouse for the analytical side, running as separate compute engines over one copy of data stored in Delta and Iceberg from the point of write, governed once by Unity Catalog. The payoff, if it delivers: no replica, no pipeline lag, no drift — and a single fresh store that AI agents can both analyze and transact against.

Stay genuinely interested but appropriately skeptical. The architecture — separate engines over one open copy — is a smarter framing of HTAP than the attempts that came before, and the lakehouse already proved Databricks can do things on object storage that people said were impossible. But the hard problem hasn't changed: one physical layout can't natively love both single-row transactions and columnar scans, and "no performance tradeoffs" is exactly the claim that decades of HTAP attempts couldn't keep. It's announced and "coming soon," not GA — so file LTAP as the most ambitious data-architecture bet of 2026, and judge it on the write-latency numbers when they finally ship.
