The CEO's AI agenda

Three years of enterprise AI evidence — and what separates the deployments that pay back from the deployments that do not.

The most productive AI use inside many large enterprises today is happening outside the official program. MIT's 2025 research on enterprise AI found that more than ninety percent of employees regularly use personal AI tools for work — without their employer's authorization, infrastructure, or governance. The shadow AI economy works. The corporate AI program, by the same study's measure, mostly does not.

That contrast is worth sitting with. The same technology, in the same hands, is producing real productivity in unsanctioned use and almost no measurable return inside the funded enterprise initiative. Three independent research streams, working with different methods and different samples, have arrived at versions of this finding through 2025. MIT's Project NANDA, examining more than three hundred enterprise AI initiatives, reported that ninety-five percent had delivered zero measurable financial return on collective investment estimated at thirty to forty billion dollars. McKinsey's annual State of AI survey of nearly two thousand companies found that only about six percent could attribute more than five percent of EBIT to AI; most reported no enterprise-wide impact at all. RAND's 2025 analysis of AI project outcomes put the failure rate around eighty percent.

This is no longer a debate. The proposition that enterprise AI programs are broadly underperforming their investment is, at this point, the consensus.

The interesting question is why — and specifically, why the corporate version fails when the shadow version works. The answer, in our view, is that the shadow economy is doing the one thing the corporate program is not: restructuring the operating model around the capability. An employee using ChatGPT to draft, summarize, analyze, code, and structure their day is not running a pilot. They are rebuilding their own workflow around what the tool makes possible. They have moved decisions, collapsed steps, and changed the cadence of their work. The corporate AI program, by contrast, has typically been designed to leave the operating model alone. It produces tools, dashboards, and trained users. It does not produce restructured operations. And in the absence of restructured operations, no measurable return appears in the P&L.

This piece sets out the evidence for that view, the operational markers that distinguish the small minority of programs that are paying back, and what the diagnosis implies for the chief executive's AI agenda.

What is wrong is not what most CEOs are being told

Four explanations for the AI value gap circulate in earnings calls, board reviews, and consulting decks. Each is partial. None is sufficient.

The technology is not yet ready. This is the most easily dismissed. The same models reportedly failing inside enterprises are productively used by hundreds of millions of consumers, by software engineering teams across the industry, and by the employees of those same enterprises in their unsanctioned shadow use. The technology works. The problem is not in the model.

The data is not ready. Closer to the mark, but partial. Many AI initiatives founder on data quality, accessibility, or governance, and Gartner has predicted that sixty percent of AI projects without AI-ready data will be abandoned through 2026. Yet data readiness was a leading explanation for failed analytics and machine-learning programs in the 2010s as well — and AI today is more capable on imperfect data than its predecessors were. Data is a real constraint. It is not the principal one.

There is a talent shortage. True at the margin, not at the core. The deployments that succeed are not, in general, run by larger or more credentialed teams. Many of the strongest operating-model rebuilds we have seen are led by an experienced line operator and a small group of engineers, not by a hundred-person AI organization.

Change management has been underinvested. Closer to the truth, but too vague to be useful. "Change management" can mean training, communication, incentives, governance, leadership commitment, or any combination thereof. It is the diagnosis given when no more specific diagnosis is available.

The reason these explanations persist is that each is fundable within the existing operating model. Buying more technology, hiring more talent, cleaning more data, running more communication programs — none of these requires the chief executive to authorize the structural changes the deeper diagnosis would demand. They are easier conversations. They produce activity. They do not produce return.

What the evidence actually shows

The single most important finding in McKinsey's 2025 research has, in our view, been underweighted in the broader discussion. Of twenty-five organizational attributes the survey tested for correlation with measurable EBIT impact from AI, the attribute that mattered most was the redesign of workflows. Yet only twenty-one percent of organizations using generative AI reported having redesigned even some of their workflows. Roughly four out of five enterprises were layering AI on top of existing processes without rethinking how the work itself was structured.

That ratio is the story. It explains, more than any other single factor, why the consensus failure rate looks the way it does.

The MIT research arrives at the same conclusion through a different route. The five percent of organizations on the productive side of what its authors call the "GenAI divide" are not those with the largest budgets, the most pilots, or the most senior dedicated AI executives. They are organizations that have rebuilt specific operational processes around the capability. They have moved decisions to different levels. They have restructured workflows. They have embedded AI into the actual sequence of work rather than alongside it. They have placed accountability with the line operator running the process, not with a center of excellence delivering capability into operations it does not control.

There is a related finding worth naming. MIT's data shows that vendor-led deployments outperformed internally-built ones at roughly two to one. The conventional explanation for in-house builds — that AI is too strategic to outsource — has the question backwards. The strategic moat is not the AI capability, which is increasingly commoditized. The strategic moat is the operating model rebuild, which is highly proprietary, and which most enterprises have neither the experience nor the institutional bandwidth to lead alone.

The four markers of a working deployment

Across the deployments we have seen produce measurable return — and across the case material in the 2025 research — four operational markers recur.

Decisions move. Identifiable decisions migrate to different levels, different roles, or different cadences. A pricing decision once made by a regional manager every quarter is now made by an algorithm hourly, with human oversight only at thresholds. A maintenance decision once driven by scheduled intervals is now driven by a model evaluating asset condition continuously. A capital reallocation decision once made annually is revisited monthly with new information. The work changes. If the same person is making the same decision in the same way after a deployment, no real deployment has occurred — only a tool change.

The workflow is rebuilt, not augmented. The sequence of work is restructured: who does what, in what order, with what handoffs. The classic anti-pattern is the AI assistant that drafts a document a human then rewrites entirely; the work has been duplicated, not redesigned. The deployments that work tend to remove steps, collapse handoffs, and eliminate review points that were artifacts of human-error tolerances now tighter under AI.

Accountability sits with the operator. The line leader running the operation in question — the plant manager, the sales VP, the head of finance, the customer service director — owns the outcome. The center of excellence, where one exists, provides infrastructure and capability. It does not own results. The reverse arrangement — a center of excellence asked to deliver business outcomes inside operations it does not run — is the most common organizational form of the failed deployment, and it is structurally unworkable. The capability owner has no operating authority. The operating leader has no capability accountability. Eighteen months in, both can produce credible explanations for why the lack of return is the other party's responsibility.

The metric is the P&L. A deployment is judged by what changes in the business — margin, throughput, working capital, time to market, customer outcomes at scale — not by what changes in adoption metrics. The fact that ninety percent of employees use the AI tool weekly is not, in itself, evidence of value capture. It is, in many cases, evidence only of a deployment that has not yet failed publicly. Adoption metrics produce success-theater that delays the real reckoning by years.

The mirror images of these four markers — decisions that don't move, workflows preserved, accountability misplaced, metric set to adoption — characterize the typical failed deployment. The pattern is consistent enough across industries, geographies, and program structures that it should be treated as the central organizational finding of enterprise AI to date.

The deeper problem: the time horizon mismatch

Even when the diagnosis is understood, a second problem usually defeats the rebuild. Most CEOs are giving their AI programs eighteen to twenty-four months to demonstrate return. Most CFOs are evaluating AI investment using traditional NPV models calibrated to that horizon. The deployments that actually work — the ones that produce the operating model changes — typically take three to five years.

The result is predictable. Programs that could have worked are killed at the eighteen-month mark, mid-rebuild, in favor of fresh initiatives that promise faster return and deliver none. The pilot count grows. The strategy frays. The shadow economy continues to outperform the official one.

This is partly a governance problem and partly a translation problem. Operating model rebuilds have no clean comparable benchmarks for the CFO to anchor on, so CFOs default to demanding faster ROI than the program structure can produce. The program in turn defaults to staging quick wins that reinforce the existing operating model rather than restructure it. By its own design, the program becomes incapable of producing the return it was funded to produce.

The chief executive is the only person in the organization who can hold the time horizon open. If the CEO does not, no one else will.

What it means for the CEO's agenda

The implication is uncomfortable. The AI agenda has, in most enterprises, been miscast as a technology agenda — a set of decisions about platforms, vendors, talent, and infrastructure that can reasonably be delegated to the CIO, a Chief AI Officer, or a center of excellence. The evidence above suggests something different. The AI agenda is, first and last, an operating model agenda. It is a question about how the enterprise is organized to make decisions, structure work, and hold line leaders accountable. That question cannot be delegated.

The shift in altitude changes the questions worth asking, the metrics worth tracking, the time horizon worth defending, and the people who should be in the room.

The questions that matter at the chief executive's level are not "which model are we using?" or "how many pilots are running?" They are these.

Which decisions are now being made differently? If the answer is none, no real deployment has occurred — regardless of pilot count or adoption rate. The first signal of a working program is identifiable, consequential decisions being made by different people, at different levels, on different cadences, than before.

Where has accountability for AI outcomes been placed? If accountability sits with a center of excellence, a Chief AI Officer, or a transformation program rather than with the line leader running the operation, the design is wrong, and additional investment will not correct it.

What workflow has been rebuilt — specifically? A satisfactory answer names the workflow, names the change, and names what is no longer being done. A vague answer about "AI-enabling our operations" is a sign that no rebuild has occurred.

What does the P&L look like, by capability, two and three years out? If the answer is expressed in adoption metrics, the program is being measured at the wrong altitude. If a credible P&L picture by capability cannot be produced, the program does not yet have a strategy.

Where am I, as chief executive, prepared to commit to operating model change? This is the hardest question, and the one most CEOs have not yet answered. An AI strategy that is not connected to operating model change is, on the available evidence, an AI strategy that does not pay back. Most chief executives have authorized investment without authorizing rebuild. That is the gap, and it is closable only at the chief executive's level.

A final word

Three observations to close.

The argument here is not that AI investment should be reduced, that pilots should be abandoned, or that center-of-excellence structures are wrong in themselves. It is that the typical enterprise AI program, as currently structured, is unlikely to produce the financial return it is targeting — because it has been designed around technology adoption rather than operating model change.

The minority of programs producing returns — five percent in MIT's data, six percent of high performers in McKinsey's, the lighthouse facilities in the World Economic Forum's most recent industrial survey — are not technology stories. They are operating model stories with technology inside them.

The companies that close the gap will not be the ones that spent the most. They will be the ones that asked different questions of their executive teams, accepted different answers, and rebuilt what needed to be rebuilt. The shadow economy is already showing them what is possible. Whether they let the official program catch up is now a question of governance, not of technology.

Sources: MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025" (July 2025); McKinsey & Company, "The State of AI in 2025: Agents, Innovation, and Transformation" (November 2025); RAND Corporation, AI project failure analysis (2024–2025); Gartner forecasts on enterprise AI project abandonment (2024–2025); World Economic Forum Global Lighthouse Network annual review (2025).

← Back to Insights