StarRocks & Apache Doris Internals: MPP Real-Time OLAP, FE/BE, and the Data Models

If you've watched the real-time analytics space lately, you've seen two names come up constantly: StarRocks and Apache Doris. They're often mentioned in the same breath, and for a good reason that surprises people — StarRocks began life as a fork of Doris. They share an architecture, a vocabulary, and most of their core ideas, then diverged on the details. So rather than write the same article twice, I'll treat them together: where they're the same (which is most of the structure) I'll say "these engines," and where they differ I'll call it out.

Both exist to do one thing the previous generation did poorly: serve low-latency analytical SQL — including multi-table joins — at high concurrency, on fresh data. Where a classic warehouse is batch and a single-purpose columnar engine struggles with joins and updates, these aim to be the fast, joinable, updatable layer behind dashboards and user-facing analytics. Let's open them up: the FE/BE architecture, the execution engine and optimizer, tablets and storage, the data models, joins, materialized views, and real-time ingestion.

The FE/BE architecture

Both engines split the cluster into two roles, and getting this split clear explains nearly everything else. The Frontend (FE) is the brain: written in Java, it holds metadata, parses SQL, plans and optimizes queries, and coordinates execution. The Backend (BE) is the muscle: written in C++, it stores the data (as columnar files on local disk) and executes query fragments. You scale storage and compute together by adding BE nodes; you run a few FEs (one leader, followers/observers) for metadata high availability.

graph TD
    C["SQL client / BI tool"]
    subgraph FEs["Frontends (FE) — Java"]
        FE1["FE leader
metadata, parse, plan, CBO, coordinate"]
        FE2["FE follower / observer
(metadata HA)"]
    end
    subgraph BEs["Backends (BE) — C++"]
        BE1["BE 1
tablets + vectorized execution"]
        BE2["BE 2
tablets + vectorized execution"]
        BE3["BE 3
tablets + vectorized execution"]
    end
    C --> FE1
    FE1 -->|"distributed plan fragments"| BE1
    FE1 --> BE2
    FE1 --> BE3
    BE1 <-->|"data exchange (shuffle)"| BE2
    BE2 <-->|"exchange"| BE3
    FE1 --- FE2

The FE/BE split. The FE plans and optimizes; BEs store columnar data and run query fragments in parallel, exchanging intermediate data with each other (the MPP shuffle). The FE is not in the data path once a plan is dispatched. This shared-nothing MPP shape — plan centrally, execute massively in parallel, exchange between nodes — is the same family as BigQuery's Dremel or a classic MPP warehouse, but packaged to deploy yourself.

A note on lineage worth keeping straight: both store data column-oriented and execute it with a vectorized engine — processing batches of column values in tight C++ loops with SIMD, the same principle behind ClickHouse's speed. StarRocks rebuilt its engine and optimizer around full vectorization and pipeline parallelism early; Doris caught up over its 1.x/2.x line with its own vectorized engine and the Nereids cost-based optimizer. By now both are vectorized, MPP, and CBO-driven.

The optimizer and execution engine

The thing that distinguishes these engines from a denormalize-everything columnar store is a real cost-based optimizer (CBO). It collects table statistics and uses them to reorder joins, choose join strategies and distribution methods, push down predicates and aggregations, and rewrite queries to use materialized views. This is why they handle complex multi-table SQL — star and snowflake schemas with several joins — where a join-averse engine would force you to flatten everything first.

Underneath, a pipeline execution engine runs fragments with vectorized operators, parallelized across the cores of each BE and across BEs. The combination — columnar storage, vectorized operators, MPP distribution, and a CBO that plans joins well — is the whole pitch: warehouse-grade SQL at interactive latency.

Tablets: how data is sharded and stored

A table is divided two ways. First by partition (usually by date or another range) for data lifecycle and partition pruning. Then each partition is split into buckets by a hash of the bucketing key, and each bucket is a tablet — the physical unit of storage, replication, and parallelism. Tablets are distributed across BE nodes and replicated (typically 3×) for durability and availability.

CREATE TABLE events (
    event_date   DATE,
    user_id      BIGINT,
    event_type   VARCHAR(32),
    amount       DECIMAL(18,2)
)
DUPLICATE KEY(event_date, user_id)
PARTITION BY RANGE(event_date) ( ... )      -- lifecycle + pruning
DISTRIBUTED BY HASH(user_id) BUCKETS 16     -- each bucket = a tablet
PROPERTIES ("replication_num" = "3");

Tablets are why these engines parallelize and rebalance well: the FE schedules a query's fragments across the tablets that hold relevant data, and the cluster can move tablets between BEs to balance load or recover from a node loss. The bucketing key matters for joins — colocating tablets on the same key lets two tables join locally without a shuffle, which we'll see in a moment.

The data models: the schema decision that defines behavior

This is where you make the most consequential choice, and it's distinctive to these engines. When you create a table you pick a data model that determines how rows with the same key are treated — especially under updates:

Model	Behavior	Use for
Duplicate Key	Keeps every row as-is (no dedup/merge); fastest ingest	Append-only logs/events — the default
Aggregate Key	Pre-aggregates rows with the same key on load (SUM/MAX/REPLACE…)	Pre-rolled metrics by dimension
Unique Key	Keeps the latest row per key (upsert semantics)	Dimension tables, CDC mirrors
Primary Key (StarRocks; Doris merge-on-write Unique)	Upsert with fast point reads and real-time partial updates	Mutable real-time tables, CDC at scale

The Primary Key model is the headline real-time feature. Older upsert (Unique) models resolved duplicates at read time (merge-on-read) — correct, but it slows queries. The Primary Key model (and Doris's merge-on-write Unique) resolves them at write time using a primary-key index, so reads stay fast and you can do partial-column updates and high-frequency upserts from a CDC stream. If you're mirroring an OLTP table or a Kafka changelog and querying it live, this is the model you want.

Joins: the thing a columnar store usually can't do well

The clearest reason to choose these engines over a flatten-everything columnar store is that they do distributed joins properly, with the CBO choosing among real strategies:

Broadcast join — ship a small table to every BE; no shuffle of the big side. The CBO picks this when one side is small.
Shuffle (hash) join — repartition both sides by the join key across BEs so matching rows co-locate. The general case for two large tables.
Colocate join — if both tables are bucketed on the join key into a colocation group, matching tablets already sit on the same BE, so the join happens locally with no shuffle at all. The fastest option, and a deliberate data-modeling choice.
Bucket-shuffle join — shuffle only one side using the other's existing bucketing, cutting network cost.

Strong joins plus a CBO mean you can keep a star schema instead of pre-joining everything into one wide table. That's a real operational difference: you maintain clean dimension tables and join at query time, rather than rebuilding giant denormalized tables on every change.

Materialized views: precompute and auto-rewrite

Both engines offer materialized views in two flavors. Synchronous MVs are single-table rollups maintained on write — the optimizer transparently rewrites a query to hit the rollup when it helps. Asynchronous MVs (refreshed on a schedule or trigger) can span multiple tables and joins, and crucially the CBO can automatically rewrite an incoming query to use a matching MV even if the query didn't name it. That last part is powerful: you define an MV for an expensive join-and-aggregate, and existing dashboard queries silently get faster without being rewritten.

Real-time ingestion

"Real-time OLAP" needs real-time loading, and both provide several paths:

Method	What it does
Stream Load	Push a batch over HTTP synchronously — the backbone of micro-batch pipelines
Routine Load	The cluster consumes a Kafka topic continuously into a table — built-in streaming ingest
Flink connector	Exactly-once sink from Flink jobs (often into the Primary Key model)
Broker / external catalog	Bulk load from object storage; query Hive/Iceberg/Hudi tables in place

Routine Load deserves the spotlight: point it at a Kafka topic and the cluster manages the consumer offsets and micro-batches the data in, no external stream processor required for the common case — the same "the database pulls from Kafka itself" pattern that makes these engines easy to wire into a streaming stack. Pair Routine Load (or the Flink connector) with the Primary Key model and you have a live, updatable, instantly-queryable mirror of an event stream.

Lakehouse reach: query data in place

Both have grown external catalogs that let them query open table formats — Hive, Iceberg, Hudi, Delta — directly on object storage without ingesting first. So the same engine can serve a fast internal table for hot data and federate out to the lake for cold or archival data in one SQL dialect. StarRocks in particular has leaned hard into being a query engine over Iceberg lakehouses, not just its own storage; Doris offers comparable multi-catalog federation. It's the same lakehouse-vs-warehouse convergence happening across the industry, reached from the real-time-OLAP side.

StarRocks vs Doris: where they actually differ

Since they share a backbone, the differences are at the margins — but they matter when choosing:

Governance. Apache Doris is an Apache Software Foundation top-level project (vendor-neutral, broad contributor base). StarRocks is open-source under the Linux Foundation, with CelerData as the primary commercial backer.
Emphasis. StarRocks has pushed hardest on the cost-based optimizer, on-the-fly joins so you don't denormalize, the Primary Key model, and being a fast query engine over Iceberg. Doris has emphasized ease of operations, a broad connector ecosystem, and its merge-on-write Unique model.
The convergence caveat. Because they share ancestry and chase the same workloads, features cross-pollinate fast — by the time you read a comparison, both may have closed a gap. Benchmark on your queries and data, not on a feature list.

What to carry away

StarRocks and Apache Doris are shared-nothing MPP analytics engines with an FE that plans (CBO included) and BEs that store columnar data in tablets and execute it vectorized — fast SQL at high concurrency on fresh data. Their distinctive levers are the data models (Duplicate / Aggregate / Unique / Primary Key), where the Primary Key / merge-on-write model unlocks real-time updates with fast reads; proper distributed joins (broadcast, shuffle, colocate) so you keep a star schema instead of flattening; auto-rewriting materialized views; and built-in streaming ingestion from Kafka and Flink.

Choose the data model for how your data mutates, bucket on your join keys to unlock colocate joins, and lean on Routine Load + Primary Key for live CDC. Where these engines fit against a single-purpose columnar store like ClickHouse — and where ClickHouse still wins — is its own comparison, which I take up in StarRocks vs ClickHouse vs Doris.