# StarRocks vs ClickHouse vs Doris: Which Real-Time OLAP Engine, and When

The short answer everyone wants: **ClickHouse wins on raw single-table scan speed and ingest throughput; StarRocks and Doris win on multi-table joins, real-time updates, and high-concurrency serving.** If you remember nothing else, that sentence routes most decisions correctly. But the three engines are close enough — and converging fast enough — that the real work is matching one to *your* workload, not crowning a universal winner. That's what this comparison is for.

I've covered each engine's internals separately — [ClickHouse](clickhouse-architecture-internals) and the shared architecture of [StarRocks and Doris](starrocks-doris-architecture). Here I put them head to head on the dimensions that actually decide a project: architecture and ops, joins, real-time updates, concurrency, ingestion, lakehouse reach, and maturity. I'll keep saying "StarRocks and Doris" together where they behave alike (they share a lineage — StarRocks forked from Doris) and split them where they don't.

## The architectural split that explains everything

Start here, because almost every difference downstream traces back to it. **ClickHouse** is a shared-nothing fleet of identical nodes running one self-contained binary; its design center is to store data column-by-column and brute-force scan it with a vectorized engine at staggering speed. **StarRocks and Doris** use a two-tier **FE/BE** design — a Java frontend that plans and cost-optimizes, C++ backends that store and execute — built from the start to plan and run distributed multi-table joins well.

```mermaid
graph TD
    subgraph CH["ClickHouse — shared-nothing, single binary"]
        N1["Node (storage + execution)"]
        N2["Node (storage + execution)"]
        N3["Node (storage + execution)"]
        N1 --- N2 --- N3
    end
    subgraph SR["StarRocks / Doris — FE/BE MPP"]
        FE["Frontend (FE)plan + cost-based optimizer"]
        B1["Backend (BE)"]
        B2["Backend (BE)"]
        B3["Backend (BE)"]
        FE --> B1
        FE --> B2
        FE --> B3
    end
          
```

The structural difference. ClickHouse keeps it simple — uniform nodes, one binary, optimized to scan a wide denormalized table blazingly fast. StarRocks and Doris separate planning (FE, with a cost-based optimizer) from execution (BEs), which is what lets them plan good distributed join strategies across a star schema. "Scan champion" vs "join-capable MPP" is the lens for the whole comparison.

## Joins — the sharpest difference

This is the dimension people feel first. **ClickHouse** grew up as a denormalize-everything engine: its classic guidance is to flatten your data into one wide table and scan it, because its join support has historically been the weak spot — the right-hand table of a hash join is loaded into memory, and there was no mature cost-based optimizer to reorder multi-table joins. It has improved a lot (parallel hash join, grace-hash join that spills, better planning), but joining several large tables in a star schema is still not where it shines.

**StarRocks and Doris** were built for exactly this. A real cost-based optimizer reorders joins from table statistics and picks among broadcast, shuffle, colocate, and bucket-shuffle strategies, so you can keep a normalized star schema and join at query time. If your workload is dashboards over a fact table plus many dimensions, this is the difference between "just query it" and "rebuild a giant flat table on every change."

**The practical heuristic:** if you can (or already do) denormalize into one big table, ClickHouse will scan it faster than anything. If you need to keep dimensions separate and join them — especially several joins, or joins that change — StarRocks/Doris will save you the denormalization pipeline and run those joins better. Pick the engine that matches how you want to model, not the other way around.

## Real-time updates

All three are append-friendly; they differ on *mutating* existing rows. **ClickHouse** handles updates through `ReplacingMergeTree` (dedup at merge time, so you query with `FINAL` or `argMax` until merges catch up) and heavyweight `ALTER ... UPDATE` mutations — workable, but not designed for high-frequency upserts. **StarRocks' Primary Key model** and **Doris's merge-on-write Unique model** resolve upserts at write time via a primary-key index, giving fast point reads, partial-column updates, and comfortable high-rate upserts from a CDC stream. For mirroring an OLTP table or a changelog and querying it live, StarRocks/Doris have the cleaner story.

## Concurrency vs single-query throughput

A subtle but decisive split. **ClickHouse** is optimized to throw an entire machine (or cluster) at one query — superb for a few heavy analytical queries, but high concurrency (hundreds/thousands of simultaneous small queries) has historically been a weaker area, since each query wants lots of resources. **StarRocks and Doris** were designed with concurrent serving in mind — many users hitting dashboards at once — and generally sustain higher QPS for that pattern. So: many concurrent dashboard users → lean StarRocks/Doris; a smaller number of giant scans → ClickHouse is hard to beat.

## Ingestion

All three pull from Kafka and integrate with stream processors, with slightly different idioms:

|  | ClickHouse | StarRocks / Doris |
| --- | --- | --- |
| Batch / micro-batch | Big batched `INSERT`s; async inserts for many small clients | Stream Load over HTTP |
| Native Kafka | Kafka engine table + materialized view pump | Routine Load (cluster consumes the topic) |
| Flink | Flink connector | Flink connector (often into Primary Key model) |
| The "too many parts" tax | Real — must batch; see the ClickHouse ingestion deep-dive | Present but eased by Primary Key / load mechanisms |

For the ClickHouse side of this in detail — batching, async inserts, and the Kafka-engine pattern — see [ClickHouse at Scale: Insert Performance & Real-Time Streaming](clickhouse-ingestion-streaming). The shapes rhyme across all three: the database pulls from Kafka itself and you must respect the write path.

## Lakehouse federation

All three have grown the ability to query open table formats (Iceberg, Hudi, Delta, Hive) in place via external catalogs, so the "fast internal table for hot data, federate to the lake for cold data" pattern works in each. StarRocks has pushed this hardest, positioning itself as a query engine over an Iceberg lakehouse, not only over its own storage; Doris offers broad multi-catalog federation; ClickHouse can read lake formats too but its center of gravity remains its own MergeTree storage. If "be the fast SQL layer over our Iceberg lake" is the goal, StarRocks is the most pointed at it.

## Operability and maturity

| Dimension | ClickHouse | StarRocks | Apache Doris |
| --- | --- | --- | --- |
| Deploy model | Single binary, uniform nodes — simplest to run | Two tiers (FE + BE) | Two tiers (FE + BE) |
| Governance | Open source; ClickHouse Inc. + large community | Linux Foundation; CelerData-backed | Apache top-level project |
| Maturity / adoption | Oldest, largest community, most battle-tested at extreme scale | Newer, fast-moving, join + Iceberg focus | Mature Apache project, broad connectors |
| Sweet spot | Massive single-table scans, logs/observability, event analytics | Joins on a star schema, real-time updates, concurrency, Iceberg | Same as StarRocks; ease-of-ops + Apache governance |

## A decision guide

- **Choose ClickHouse** when your data is (or can be) one wide denormalized table, you need maximum scan and ingest throughput, the workload is logs/metrics/events, and concurrency is modest. The simplest to operate and the brute-force speed king.

- **Choose StarRocks** when you need fast multi-table joins on a star schema, real-time upserts (Primary Key), high concurrent QPS for user-facing analytics, or a fast SQL engine over an Iceberg lakehouse.

- **Choose Apache Doris** for the same join/update/concurrency profile as StarRocks when you specifically want Apache governance, a broad built-in connector ecosystem, and a reputation for operational ease.

**The benchmark caveat — take it seriously.** All three publish benchmarks where they win, because each is tuned for a different shape of workload and you can design a benchmark to favor any of them. They're also converging: ClickHouse keeps improving joins; StarRocks and Doris keep improving scan and ingest. Any feature gap I name here may be narrower by the time you read it. Decide by running a proof-of-concept on your real queries, your real data volumes, and your real concurrency — not on someone's marketing chart.

## What to carry away

Map the engine to the workload shape. **ClickHouse** is the denormalized-scan-and-ingest champion — simplest to run, unbeatable on big single-table queries, weaker on joins and high concurrency. **StarRocks and Doris** are FE/BE MPP engines built for **joins, real-time updates, and concurrent serving**, letting you keep a star schema and mirror a CDC stream live; between them, StarRocks leans into joins and Iceberg, Doris into Apache governance and operability.

Decide by how you model data (flat vs star), how it mutates (append vs upsert), and how many people query at once — then prove it on your own workload. The deep mechanics behind these verdicts are in the [ClickHouse internals](clickhouse-architecture-internals) and [StarRocks/Doris architecture](starrocks-doris-architecture) pieces.
