The AI productivity paradox is more interesting than either side admits
Task-level AI gains are measurable. Firm-level gains are scattered. Economy-wide gains are barely visible. The gap between them is the story.
In the first half of 2025, AI-related capital spending (data centres, chips, software) accounted for 92 per cent of US GDP growth. In February 2026, a National Bureau of Economic Research survey of roughly 6,000 CEOs and CFOs across four countries found that 90 per cent of firms had seen no measurable productivity improvement from AI. The AI productivity paradox is real. Reconciling the two statistics turns out to be more useful than choosing sides.
The AI productivity paradox, quantified
The GDP contribution of AI investment is real. Goldman Sachs tracked it carefully through 2025: AI capital spending increased sharply, and this spending mechanically adds to GDP just as any investment does — it doesn't require the investment to actually 'work'. Buying servers adds to GDP whether the servers produce anything useful or not.
The net contribution of GenAI to GDP through actual productivity gains is a different number. After accounting for trade flows in intellectual property and computing hardware, GenAI contributed roughly 0.4 percentage points to US GDP in the first half of 2025. That's not nothing, but it's a long way from the headline 92 per cent figure, which measures investment rather than returns.
The NBER survey found 90 per cent of firms saw no measurable improvement. Goldman Sachs analysts, examining the relationship between AI adoption rates and output across the economy, found no meaningful correlation. This is the Solow Paradox updated for a new decade. In 1987, Robert Solow wrote: 'You can see the computer age everywhere except in the productivity statistics.' The same observation applies to AI in 2026, at least at the aggregate level.
Where AI productivity gains are measurable
The national-level picture obscures significant variation at the task level. Specific, well-scoped tasks show productivity gains that are neither small nor contested.
Coding is the clearest case. Controlled studies of GitHub Copilot have found developers completing tasks 55 per cent faster. That's a controlled-experiment number — real, but measured under conditions designed to isolate the tool's effect. The real-world picture from Faros AI is more complicated: a June 2025 study of over 10,000 developers across 1,255 engineering teams found that developers using AI tools took 19 per cent longer to complete individual tasks than those working without. The same study found that high-AI-adoption teams merged 98 per cent more pull requests and completed 21 per cent more total tasks. These findings aren't contradictory: AI assistance appears to shift work toward higher volume at the cost of per-task focus time.
Customer service shows a cleaner signal. A synthesis of seven separate field experiments conducted between late 2023 and mid-2024 found AI chatbots increased sales by 16 per cent. The setting matters: well-defined queries, fast quality verification, minimal workflow reorganisation required.
Radiology is another domain with measurable results. A multi-reader study found AI assistance reduced chest X-ray interpretation time by roughly 36 per cent, with specificity increasing by 11 percentage points. Medical imaging has properties that make it a strong case for AI: the input is structured, the output is verifiable, and the expertise distribution is uneven enough that AI assistance can add genuine value even in experienced hands.
The pattern across these three domains: productivity gains are largest when the task is well-defined, quality verification is fast, and the tool fits into an existing workflow without requiring reorganisation. When those conditions change, the gains shrink.
| Domain | Controlled gain | Real-world result | Conditions that matter |
|---|---|---|---|
| Software development | 55% faster coding (Copilot) | 19% slower per task; 98% more PRs merged (Faros AI, n=10k+) | Fast iteration loops; gains show in volume, not speed |
| Customer service | No controlled study | +16% sales across 7 field experiments | Well-defined query types; fast feedback loop |
| Radiology | 36% faster X-ray reads | No large-scale field study | Structured input; verifiable output; expertise distribution |
The bottleneck problem
Making a non-bottleneck step faster doesn't improve the throughput of the system. A developer who codes 55 per cent faster still faces code review, testing, deployment pipelines, product decisions, and stakeholder sign-off. If any of those steps is the actual constraint on shipping software, the coding speedup doesn't translate into shipping faster — it means the developer waits longer at the constraint.
This isn't a failure of AI; it's a structural feature of how productivity in complex organisations works. The same dynamic played out with electrification. By 1910, most US manufacturing plants had converted to electric power. Manufacturing productivity didn't surge until the 1920s and 1930s — because the gains required physically reorganising factories around electric motors rather than simply swapping a steam engine for an electric one. Factories designed around centralised power transmission couldn't capture the distributed flexibility that electricity made possible until someone redesigned the factory floor.
AI adoption is roughly at the 1910 stage in most organisations. The tools are in place. The reorganisation mostly hasn't happened.
What reorganisation looks like in practice: redesigning workflows so AI assistance is applied at the actual bottleneck, not around it. Reducing verification overhead for AI-assisted outputs. Shifting human attention from generating content to reviewing and directing it. This takes months, not days, and carries short-term productivity costs before the gains arrive. The 90 per cent of firms reporting no improvement are likely still paying those reorganisation costs.
“AI adoption is at roughly the 1910 stage of electrification. The tools are in place. The factory-floor reorganisation mostly hasn't happened.”
What the 2025 US productivity data actually shows
US nonfarm business productivity grew 4.9 per cent in the third quarter of 2025, according to the Bureau of Labour Statistics. The second quarter was revised to 4.1 per cent. Unit labour costs declined in both quarters — a pattern not seen since 2019. US aggregate productivity grew roughly 2.7 per cent across 2025, nearly double the prior decade's average.
Stanford economist Erik Brynjolfsson, who developed the productivity J-curve framework, interprets this as the beginning of a harvest phase. The J-curve argument: general-purpose technologies generate a period of apparent stagnation while organisations learn to reorganise around them, followed by a step-change in productivity once the reorganisation is complete. He argues the 2025 data shows the curve inflecting.
MIT economist Daron Acemoglu is more cautious. His work points out that the tasks where AI shows large productivity effects in experiments cover a relatively narrow slice of the total task distribution — domains with fast quality verification and well-specified outputs. Many occupations involve tasks with high verification costs or ambiguous success criteria, where AI assistance doesn't generate comparable gains.
One year of strong productivity data doesn't settle the debate. The 2025 numbers are consistent with both the J-curve story and the alternative: that 2025 was partly cyclical, partly a lagged effect of post-pandemic labour market normalisation, and that the AI contribution was smaller than Brynjolfsson estimates. Two or three more years of comparable growth would be more informative.
One data point worth keeping in mind: AI adoption among US firms using AI to produce goods rose from 3.7 per cent in late 2023 to 10 per cent by September 2025. The gains from that cohort will show up in aggregate data on a lag. If the adoption-to-gain ratio from early movers is representative, 2026 and 2027 productivity data should be more informative than 2025 data about what AI's actual structural contribution looks like.
Three things worth watching
Given the ambiguity in the current data, three indicators are worth tracking more carefully than the headline AI adoption numbers.
- Adoption quality, not adoption rate. The share of US firms using AI to produce goods rose from 3.7 to 10 per cent in two years. What that number doesn't capture is whether firms are using AI to assist individuals at the margin, or whether they have genuinely redesigned workflows around the tool. The former generates good Copilot testimonials; the latter generates firm-level productivity gains. Survey data that distinguishes between these will matter more in 2026 than the headline adoption percentage.
- Unit labour costs across several more quarters. The Q2 and Q3 2025 decline is suggestive but not yet a trend. If unit labour costs continue declining in 2026 while AI adoption continues rising, that's the clearest macro signal that something structural is happening — and not merely that 2025 had a favourable labour market.
- Sector-level data rather than economy-wide averages. AI's effects are concentrated, not uniform. Software, customer service, and medical imaging have already produced measurable results. Legal document review and financial analysis are plausible next high-concentration sectors, given the task structure and verification characteristics. Economy-wide averages will show a muted signal for years because most sectors are still at the 'individual Copilot subscriptions' stage.
The national productivity statistics are a lagging, averaged signal. By the time AI's contribution is unambiguous in the BLS data, which firms and sectors are capturing gains and why will already be visible in more granular sources.
What this means for practitioners
The productivity paradox of AI is not that AI doesn't work. It's that task-level performance improvements, real and in some domains substantial, take longer to translate into firm-level productivity gains than most organisations expect, because those gains require workflow reorganisation rather than just tool adoption.
The interesting question to ask inside an organisation isn't 'are we using AI?' — it's 'have we changed any workflows around AI, or have we just given individuals a faster typewriter?' The answer to that second question predicts which side of the 90/10 split your organisation ends up on.
Frequently asked questions
Related reading
Local LLMs in production, 2026: the honest economics
Vendor benchmarks leave out the two cost items that usually flip the self-hosting decision: engineering overhead and the model-update cycle. Here is the honest break-even analysis.
Context rot is real: what the 18-model study means for production LLM engineering
Chroma's 2025 research tested 18 frontier models and found every one degrades as context grows. This is what context rot means for production engineering decisions — and the specific patterns that address it.
Model Context Protocol: what it actually standardises (and what you'll still have to build yourself)
MCP is becoming the standard interface for connecting AI agents to external tools. But most teams adopting it don't have a clear picture of what the protocol covers and what it deliberately leaves out.