# Airflow vs Prefect vs Dagster: Choosing a Data Orchestrator

Every data platform needs something to answer "run this, then that, retry if it fails, and tell me when it breaks." For a decade the answer was Apache Airflow, full stop. Now there are three serious choices — Airflow, Prefect, and Dagster — and they aren't three flavors of the same thing. They embody genuinely different philosophies about what you're even orchestrating: **tasks**, or the **data assets** those tasks produce. Get the philosophy right and the tool nearly picks itself. Get it wrong and you'll fight your orchestrator's worldview for years. This is a workload-first comparison, organized around the two axes that actually distinguish them.

The two axes: **task-centric vs asset-centric** (do you declare steps, or the data you want to exist?), and **static vs dynamic** (is the pipeline graph fixed at parse time, or built at runtime?). Airflow is task-centric and historically static; Prefect is task-centric and dynamic; Dagster is asset-centric. Those positions explain almost every difference that follows.

## Apache Airflow: the incumbent, task-centric

Airflow is a workflow scheduler built around the DAG — a directed acyclic graph of tasks with dependencies, defined in Python, run by a scheduler against a metadata database. I covered how it works in [Airflow Internals](airflow-internals); the relevant fact here is its worldview: **you orchestrate tasks**. "Run extract, then transform, then load." Airflow's enormous advantage is gravity — it's everywhere, every cloud offers a managed version (MWAA, Cloud Composer, Astronomer), there's an operator/provider for virtually every system, and an army of engineers already knows it. **Airflow 3.0** (2025) modernized it considerably — DAG versioning, a revamped React UI, a task-execution API that decouples workers, and stronger data-aware scheduling via assets — but the center of gravity is still scheduled task graphs.

Its honest weaknesses are the flip side of its history: it was built scheduler-first, not data-first, so "did this task succeed?" is native but "is this dataset fresh and correct?" was bolted on later. Dynamic, runtime-shaped pipelines fight the static DAG model, and top-level DAG-file code that runs on every scheduler parse is a classic foot-gun.

## Prefect: Python-native and dynamic, task-centric

Prefect keeps the task-centric worldview but throws out the static graph. Pipelines are plain Python functions decorated as `@flow` and `@task`; the DAG is discovered by *running* the code, not by parsing a static definition. That makes **dynamic workflows** — loops, conditionals, fan-out whose width depends on runtime data — natural rather than awkward.

```python
from prefect import flow, task

@task(retries=3)
def score(batch): ...

@flow
def nightly(batches):
    for b in batches:          # the graph's shape depends on runtime data — fine here
        score.submit(b)        # dynamic fan-out, no static DAG to predeclare
```

Prefect's other defining choice is the **hybrid execution model**: Prefect Cloud (or a self-hosted server) handles orchestration and observability, but your code runs on *your* infrastructure — the control plane never needs to see your data. It feels the most like "just Python," which makes it a favorite for ML and data-science workflows where the pipeline is dynamic and code-heavy. The trade-offs: a smaller ecosystem of pre-built integrations than Airflow's, and — like Airflow — it orchestrates tasks, so data lineage and asset freshness aren't the native unit.

## Dagster: asset-centric and data-aware

Dagster makes the genuinely different move. Instead of declaring tasks, you declare **software-defined assets** — the tables, files, and models you want to *exist* — and their dependencies on other assets. You don't say "run the transform task"; you say "this `customers` table is produced from these raw tables," and Dagster figures out execution. The orchestrator becomes data-aware by construction: it knows your assets, their lineage, their freshness, and their dependencies, because that's the unit you program in.

```python
from dagster import asset

@asset
def raw_orders(): ...

@asset
def customer_ltv(raw_orders):   # dependency is the ASSET, not a task ordering
    return compute_ltv(raw_orders)
# Dagster knows the lineage raw_orders -> customer_ltv, its freshness, and how to rebuild it
```

That shift pays off in a built-in data catalog and lineage, freshness policies ("this asset should be no more than 2 hours stale"), a strong typing/testing story, and a development experience built around materializing and inspecting assets locally. The costs are real too: it's the most opinionated of the three, the asset mental model is a genuine learning curve for a team steeped in task DAGs, and its ecosystem, while growing fast (and strong on the dbt integration), is younger than Airflow's.

## The core distinction, in one picture

```mermaid
graph TD
    subgraph TASK["Task-centric (Airflow, Prefect)"]
        T1["task: extract"] --> T2["task: transform"] --> T3["task: load"]
        Q1["You declare: the STEPS to run.Orchestrator tracks: did each task succeed?"]
    end
    subgraph ASSET["Asset-centric (Dagster)"]
        A1["asset: raw_orders"] --> A2["asset: customer_ltv"] --> A3["asset: exec_dashboard"]
        Q2["You declare: the DATA that should exist.Orchestrator tracks: is each asset fresh & correct?"]
    end
          
```

The worldview that drives everything else. Task-centric tools orchestrate *steps* and answer "did the job run?"; asset-centric Dagster orchestrates *data products* and answers "is this table fresh and correct, and what produced it?" Neither is wrong — but a team that thinks in datasets and lineage will fight a task scheduler, and a team that just needs jobs to run on time may find the asset model more ceremony than they need.

## Side by side

|  | Airflow | Prefect | Dagster |
| --- | --- | --- | --- |
| Core abstraction | Task DAG | Flow / task (Python) | Software-defined asset |
| Graph shape | Mostly static (3.0 improves) | Dynamic (runtime) | Asset graph (declarative) |
| Data awareness | Added later (assets) | Task-level | Native — lineage & freshness |
| Ecosystem / maturity | Largest, most battle-tested | Growing, Python-first | Growing fast, strong dbt story |
| Local dev / testing | Weakest of the three | Good (just Python) | Strongest — built for it |
| Managed options | MWAA, Cloud Composer, Astronomer | Prefect Cloud | Dagster+ |
| Sweet spot | Broad scheduled task graphs, existing skills | Dynamic, code-heavy / ML flows | Data-product platforms wanting lineage |

## A decision guide

- **Choose Airflow** if you need the broadest ecosystem and battle-tested ubiquity, your team already knows it, or you want a managed offering on every cloud. It's the safe default for general scheduled orchestration, and 3.0 closed much of the modernization gap.

- **Choose Prefect** if your pipelines are dynamic and code-heavy — especially ML and data-science workflows — and you want orchestration that feels like plain Python, with a hybrid model that keeps your data on your infrastructure.

- **Choose Dagster** if you think in data assets and want lineage, freshness, and a strong testing/local-dev experience as first-class features — particularly if you're building a data-product platform and lean on dbt.

**Don't rip out a working Airflow to chase a nicer model.** Migrating orchestrators is a deceptively large project: it's not just rewriting DAGs, it's re-establishing every integration, every alert, every on-call runbook, and the team's hard-won operational intuition. The newer tools are genuinely better at the things they're better at — but "our Airflow works and everyone knows it" is a real, valuable asset that a slicker asset model rarely outweighs on its own. Switch when you have a concrete pain the new tool solves (you truly need asset lineage; your workflows are fundamentally dynamic), not because the demo looked cleaner. The orchestrator is load-bearing infrastructure; change it for a reason, not a vibe.

**The choice is more reversible than it looks if you keep transformation logic out of the orchestrator.** The teams that get badly locked in are the ones who put business logic *inside* operators and tasks. Keep your actual transformations in [dbt](analytics-engineering-dbt), in well-factored Python packages, or in SQL the orchestrator merely invokes — and the orchestrator becomes a thin scheduling layer you could swap with far less pain. Whichever tool you pick, treat it as the conductor, not the orchestra.

## What to carry away

Airflow, Prefect, and Dagster differ less in features than in worldview. Airflow and Prefect are **task-centric** — you orchestrate steps and the tool tracks whether they ran — with Airflow betting on ubiquity and ecosystem (and a real modernization in 3.0) and Prefect betting on dynamic, Python-native flows and a hybrid execution model. Dagster is **asset-centric**: you declare the data that should exist, and the orchestrator becomes data-aware, with lineage, freshness, and testing as native concepts.

So choose by how your team thinks. Need broad, proven scheduled orchestration with skills you already have? Airflow. Dynamic, code-heavy, ML-shaped workflows? Prefect. A data-product platform where lineage and freshness are the point? Dagster. And whatever you pick, keep your transformation logic out of the orchestrator so it stays a thin, swappable conductor — the orchestrator is one of the six [undercurrents](fundamentals-data-engineering-lifecycle) that runs beneath the whole lifecycle, not the place your business logic should live.
