AI Strategy: Use-Case Portfolios, Build vs Buy, and the Demo-to-Production Gap

Every leadership team I talk to wants an "AI strategy," and most of them already have a graveyard of pilots to prove they tried. A chatbot that demoed beautifully and quietly died; a copilot nobody adopted; a proof-of-concept stuck in "evaluation" for nine months. The technology works — that's not the problem anymore. The problem is that an AI strategy got mistaken for an AI demo, and the unglamorous machinery that turns a demo into a dependable production system was never planned for. This piece is about that machinery.

The thesis I'll argue: an AI strategy is mostly a good data strategy plus a disciplined use-case portfolio plus an honest plan for the demo-to-production gap. The model is rarely the hard part now — you can rent a frontier model by the token. The hard parts are picking the right problems, getting the data ready, evaluating rigorously, governing the risk, and getting people to actually use the thing. I'll take those in turn.

AI strategy is mostly data strategy

The uncomfortable truth for anyone hoping AI lets them skip the data work: it doesn't, it raises the stakes. Every useful enterprise AI application is grounded in the company's own data — retrieval over documents, an agent querying a warehouse, a model fine-tuned on internal examples. If that data is fragmented, ungoverned, and untrustworthy, the AI built on it is fragmented, ungoverned, and untrustworthy — just faster and more confidently. I've made this point about semantic layers feeding agents: the AI is only as good as the data and definitions under it.

So step zero of an AI strategy is an honest read on data readiness for each candidate use case — and the foundation is the same data strategy work (governance, data products, trust). Where the data is ready, AI can move fast; where it isn't, "do AI" is really "fix the data first." Pretending otherwise is how pilots stall: the model is fine, the data underneath it isn't.

The use-case portfolio: value × feasibility

The core artifact of an AI strategy is not an architecture; it's a portfolio of use cases scored on value and feasibility, sequenced. Value is the business impact if it works; feasibility folds in data readiness, technical risk, and how tolerant the use case is of being wrong. Plot them and the strategy almost writes itself:

graph TD
    HIVF["High value, high feasibility
→ DO NOW (flagship wins)"]
    HiLo["High value, low feasibility
→ INVEST in data/foundations first"]
    LoHi["Low value, high feasibility
→ quick wins / skill-building"]
    LoLo["Low value, low feasibility
→ AVOID (the demo trap)"]
    PORT["Scored use-case portfolio"]
    PORT --> HIVF
    PORT --> HiLo
    PORT --> LoHi
    PORT --> LoLo

The value × feasibility portfolio. Start with high-value, high-feasibility use cases for visible wins; treat high-value/low-feasibility as a reason to invest in data foundations; use low-value/high-feasibility for skill-building; and refuse the low-value/low-feasibility work no matter how impressive the demo. Most failed AI programs are a pile of bottom-right projects.

The discipline this enforces is saying no to the shiny-but-pointless. A flashy demo with no path to value is worse than no project, because it consumes the credibility you need for the real ones. The best first AI use cases are often boring and high-value — document processing, support deflection, internal search, code assistance — not the moonshot the board read about. Boring and shipped beats visionary and stuck.

Build vs buy vs fine-tune: the decision ladder

For each use case, there's a recurring "how do we build this?" decision, and the right answer is almost always lower on the effort ladder than engineers instinctively reach for. Climb only as far as the problem forces you:

Rung	Approach	When
1	Buy / use an API + prompt	Default. A frontier model with good prompting solves a surprising amount. Start here.
2	RAG (retrieval-augmented)	The task needs your private/current knowledge. The most common enterprise pattern.
3	Fine-tune	You need a consistent style/format/behavior, or to shrink a smaller model to a task — and you have good labeled examples.
4	Train from scratch	Almost never. Only with unique data, scale, and a moat that justifies the cost.

The strategic error is starting at rung 3 or 4 — fine-tuning or training when prompting plus RAG would have shipped in a fraction of the time. Most enterprise value lives on rungs 1 and 2. Fine-tuning is for specific behavioral or cost reasons, not a default, and training a foundation model is a decision a handful of companies should make. Buy the model; spend your scarce engineering on the data, retrieval, evaluation, and integration around it — that's where your differentiation actually is.

Evaluation: the line between demo and production

If I could enforce one practice on every AI program, it's this: build the evaluation harness before you scale the system. A demo proves the system can succeed once; production requires knowing how often it succeeds, on what inputs it fails, and whether a change made it better or worse. Without systematic evaluation — a representative test set, metrics for the task, ideally automated scoring — you're flying blind, "improving" prompts on vibes and unable to defend the system when it's challenged.

This is the discipline that most separates teams that ship from teams stuck in pilots. The ones in pilot purgatory usually can't answer "is it good enough?" because they never defined good enough or measured against it. Evaluation is also where observability meets strategy: you instrument quality, cost, and drift, and you treat regressions as releasable/blocking signals, exactly as you would for any other software. An AI feature without an eval harness isn't a product; it's a perpetual demo.

The demo-to-production gap is 80% of the work and 0% of the demo. The model spitting out a good answer on stage is the easy 20%. The other 80% — wiring it to real, governed data; building the eval harness; adding guardrails for hallucination, prompt injection, and PII; handling the unhappy paths; integrating into the actual workflow; and the change management to get humans to trust and adopt it — is invisible in a demo and is where programs die. Budget the strategy for the 80%. If a plan only accounts for "pick a model and prompt it," it's a plan to build another demo.

Governance and risk: plan it in, not on

AI adds risks ordinary software doesn't: it's probabilistic (it can be confidently wrong), it can leak training or context data, it can be manipulated by inputs, and — increasingly — it acts autonomously as an agent. A credible AI strategy treats governance as a design input, not an afterthought: human oversight where stakes are high, traceability of what the system did and why, guardrails on inputs and outputs, and clear accountability. In regulated settings this is non-negotiable and increasingly codified — the EU AI Act imposes risk-tiered obligations (transparency, human oversight, logging) that you must design for, not retrofit.

The agentic turn raises the stakes further: an agent that can take actions needs the controls to match, which is the subject of designing auditable multi-agent systems. The strategic point is simple — bake the risk controls into the use-case design from day one, because bolting them on after a public failure is far more expensive than building them in.

People: adoption is the last mile

The final, most-skipped piece: an AI system delivers value only when people use it and trust it, and that's an organizational problem, not a technical one. Workflow integration (meet people where they work, don't make them visit a separate tool), training, and honest framing (assistant, not oracle) decide adoption. A technically excellent AI feature that nobody trusts or fits into their day is a failed project, full stop — and "change management" is the unglamorous work that turns a capable system into a used one.

What to carry away

An AI strategy that ships is a good data strategy plus a scored, sequenced use-case portfolio plus an honest plan for the demo-to-production gap. Start from data readiness; pick use cases by value × feasibility and refuse the shiny-but-pointless; climb the build ladder only as far as forced (buy/prompt → RAG → fine-tune → almost never train); build the evaluation harness before scaling, because it's the line between a demo and a product; design governance and risk controls in from day one (the EU AI Act assumes you did); and treat adoption as the real last mile.

The through-line with its sibling data strategy: the model is the easy part, and the hard parts are alignment, data, evaluation, and people. Buy the intelligence; invest your effort in the boring 80% that makes it dependable. Do that, and you escape pilot purgatory; skip it, and you add another demo to the graveyard.