I've read a lot of documents titled "Data Strategy," and most of them were a shopping list: migrate to Snowflake, adopt dbt, stand up a lakehouse, buy a catalog. Those are platform decisions, and useful ones β but they're not a strategy any more than "buy a hammer" is a plan to build a house. A real data strategy answers a harder question: what is the business trying to do, and how does data make that measurably more likely? Get that wrong and you can spend two years and a fortune on a beautiful platform that moves no needle anyone cares about.
This is the advisory piece, not an internals deep-dive. After enough engagements you see that good data strategies share a shape: they connect business outcomes to a small number of bets, choose an operating model deliberately, treat data as a product, build trust through governance, and β above all β sequence the work so value shows up early. I'll walk that shape, and the failure modes that sink the rest.
Strategy starts with outcomes, not infrastructure
The first and most violated rule: a data strategy is downstream of business strategy, not parallel to it. If the company is competing on customer experience, the data strategy is about the 360-degree customer view and real-time personalization. If it's competing on operational efficiency, it's about supply-chain visibility and cost analytics. If it's in a regulated industry fighting fines, it's about lineage and reporting accuracy. The infrastructure is whatever serves those β and you genuinely cannot choose it well until you know which.
So the strategy document should open not with a tool but with a short list of business outcomes and the decisions or products data will drive for each. Everything else β platform, team, governance β is justified by reference to that list. If a proposed initiative can't be traced to a business outcome, that's not a strategy gap; it's a sign the initiative shouldn't be on the roadmap.
graph TD
BIZ["Business outcomes
(grow revenue, cut cost, reduce risk)"]
USE["Use cases & data products
(the decisions data drives)"]
OPS["Operating model
(who owns data, how teams work)"]
PLAT["Platform & architecture
(lakehouse, warehouse, pipelines)"]
GOV["Governance & trust
(quality, lineage, access, privacy)"]
BIZ --> USE --> PLAT
USE --> OPS
OPS --> PLAT
GOV --> PLAT
GOV --> USE
The data-strategy stack. Business outcomes define the use cases and data products; those drive the operating model and the platform; governance underpins both the products (so people trust them) and the platform (so it's safe). Strategies fail when they start in the middle β at the platform β and never connect up to outcomes or down to trust.
Defense and offense: know which you're playing
A useful lens (from Dallemule and Davenport's work) splits data activity into defense β control, compliance, security, accuracy, "one version of the truth" β and offense β growth, insight, experimentation, speed to a new question. They pull in opposite directions: defense wants central control and standardization; offense wants distributed freedom and flexibility. No organization can max both at once, and the right balance depends on the industry.
A bank or hospital sits toward defense (the cost of a compliance failure dwarfs the upside of a faster dashboard); a consumer-tech startup sits toward offense (speed of insight is survival, and the regulatory downside is small). The strategic mistake is being unconscious about it β applying startup-style data anarchy in a regulated bank, or locking a growth company in governance so tight that no team can ship an experiment. Naming your position on this spectrum is one of the highest-leverage decisions in the whole strategy.
The operating model: who owns data
The organizational design β who builds and owns data, and how teams interact β matters more than the tooling, and it's the part executives most often skip. Four common shapes, each a real trade-off:
| Model | How it works | Strength / weakness |
|---|---|---|
| Centralized | One data team serves all domains | Consistency & control; becomes a bottleneck and loses domain context |
| Decentralized | Each business unit owns its data & team | Fast, domain-aware; duplicated effort and inconsistent definitions |
| Hub-and-spoke | Central platform/standards team + embedded domain analysts | The pragmatic default β shared platform, local ownership |
| Data mesh | Domains own data as products on a self-serve platform, federated governance | Scales org-wide; demands real platform & org maturity to attempt |
Most organizations land, correctly, on some hub-and-spoke variant: a central team owns the platform, standards, and shared assets, while domains own their own products and analytics. Data mesh is the fashionable answer, and it's powerful β but it's an organizational and platform commitment that punishes the unprepared. Choosing the operating model is choosing where the bottlenecks and the inconsistencies will be; you don't get to avoid both.
Data as a product
The single idea that has improved data outcomes most in the last few years is treating data as a product rather than as exhaust from applications. A data product has an owner, a documented interface (schema and semantics), a quality and freshness SLA, discoverability, and consumers it's accountable to β like a microservice, but for data. The shift is from "we dumped some tables in the warehouse" to "this curated, governed dataset is owned, supported, and trustworthy."
This is what makes everything else work: a use case can rely on a data product instead of re-deriving the truth; governance attaches to a product with an owner instead of an orphaned table; and a mesh, if you go there, is literally a network of these products. You don't need a mesh to adopt the product mindset β and adopting it is most of the value people think they need a mesh for.
Sequencing: value early, or lose the mandate
A strategy that's correct but takes two years to show anything will be cancelled in eighteen months. The sequencing principle: prioritize by value Γ feasibility, and deliver a visible win early to earn the political capital for the harder, slower foundational work. Pick a first use case that's painful enough to matter and tractable enough to ship in a quarter; use its success to fund the platform and governance investments that don't demo well but compound.
The most useful artifact I bring to a data-strategy engagement isn't an architecture diagram β it's a prioritized portfolio: a short list of use cases scored on business value and on feasibility (data readiness, technical lift, org buy-in), with an explicit sequence. It forces the conversation away from "which platform" and onto "which outcome first," which is the conversation that actually determines success. The platform falls out of the portfolio, not the other way around.
The four ways data strategies die. (1) It's a platform migration in disguise β all infrastructure, no business outcomes, so nobody can say what it was for. (2) Boiling the ocean β a three-year foundational program with no early win, cancelled before payoff. (3) Governance theater β a thick policy document and a council that meets monthly while data quality stays terrible, because governance was treated as paperwork instead of an operating capability. (4) No business sponsor β data leadership writes the strategy alone, the business never owns it, and it dies as "an IT thing." Every one of these is an alignment failure, not a technology failure β which is exactly why the technology can't fix them.
Governance as trust, not paperwork
Governance is the part everyone agrees is important and nobody wants to do, usually because it's misframed as compliance bureaucracy. Reframe it as trust: the strategy's job is to make data people will actually rely on β which means quality they can verify, lineage they can trace, definitions everyone shares, and access that's both safe and not so locked-down that work stops. That's an operating capability (owned, automated, measured), not a binder. Done as paperwork it's theater; done as capability it's what lets the business trust the numbers enough to act on them β and a strategy whose outputs aren't trusted has delivered nothing. The mechanics live in pieces like catalog and lineage and, in regulated settings, lineage as regulatory proof.
What to carry away
A data strategy is not a tool list; it's a sequenced set of business-aligned bets with an operating model and a trust layer. Start from business outcomes and the decisions/products data will drive; decide consciously where you sit on the defenseβoffense spectrum; choose an operating model (usually hub-and-spoke, mesh only with maturity) knowing it's a choice about where bottlenecks live; treat data as a product with owners and SLAs; sequence for an early visible win; and run governance as a trust capability, not paperwork.
The recurring theme is that the hard parts are alignment and organization, not technology β which is humbling for those of us who love the technology. The platform should fall out of the strategy, never stand in for it. And because so much of the modern roadmap is AI, the natural next question is how an AI strategy sits on top of this foundation β which, it turns out, is mostly a good data strategy plus a disciplined use-case portfolio.