Why do controlled AI productivity studies show bigger gains than real-world studies?

Controlled experiments isolate a single task in ideal conditions — the tool is working, the task is well-defined, and there's no reorganisation cost. Real-world deployments involve existing workflows, team coordination overhead, review processes, and the fact that making one step faster only helps if that step was the actual constraint on output. The gap between the two numbers is a measure of how much reorganisation a given organisation has done around the tool.

What is the productivity J-curve and why does it apply to AI?

Erik Brynjolfsson's J-curve framework describes how general-purpose technologies create a period of apparent productivity stagnation while organisations invest in and learn to reorganise around them, before producing a step-change in output. The argument is that the reorganisation costs temporarily suppress measured productivity gains even as real capability is being built. Electrification, the internet, and now AI have each followed versions of this pattern.

Is the 2025 US productivity growth actually due to AI?

That's contested. US nonfarm business productivity grew 4.9% in Q3 2025 and 4.1% in Q2 — well above the prior decade average. Erik Brynjolfsson attributes this partly to AI entering its harvest phase. Daron Acemoglu and others point to labour market dynamics and the narrow task distribution where AI shows measurable gains. One year of strong data is consistent with multiple explanations. The picture will be clearer after 2026-2027 data is available.

What should organisations watch to know if AI is actually lifting their productivity?

Three things: whether workflows have been reorganised around AI or whether AI was just added on top of existing processes; whether unit labour costs are declining over multiple quarters; and what sector-specific data shows, since economy-wide averages lag significantly and mask high variance between early adopters and the rest.

Industry AnalysisMay 30, 20267 min readReviewed May 30, 2026

The AI productivity paradox is more interesting than either side admits

Task-level AI gains are measurable. Firm-level gains are scattered. Economy-wide gains are barely visible. The gap between them is the story.

By FlowVerify Editorial Team

Key takeaways

92% of US H1 2025 GDP growth came from AI investment spend, not productivity gains; the net GenAI contribution to output was 0.4 percentage points.
Task-level gains are real: coding 55% faster in controlled studies, X-ray reading 36% faster, customer service sales up 16% across 7 field experiments.
90% of firms in a 2026 NBER survey of ~6,000 CEOs/CFOs reported no measurable productivity improvement from AI.
The bottleneck problem: AI speeds up individual tasks, but if those tasks aren't the constraint on output, the firm sees no system-level gain.
US factories went electric by 1910 but didn't see productivity gains until the 1920s, after redesigning the factory floor. AI follows the same pattern.
Three things worth tracking: adoption quality vs rate, unit labour costs over multiple quarters, and sector-level rather than economy-wide data.

In the first half of 2025, AI-related capital spending (data centres, chips, software) accounted for 92 per cent of US GDP growth. In February 2026, a National Bureau of Economic Research survey of roughly 6,000 CEOs and CFOs across four countries found that 90 per cent of firms had seen no measurable productivity improvement from AI. The AI productivity paradox is real. Reconciling the two statistics turns out to be more useful than choosing sides.

The AI productivity paradox, quantified

The GDP contribution of AI investment is real. Goldman Sachs tracked it carefully through 2025: AI capital spending increased sharply, and this spending mechanically adds to GDP just as any investment does — it doesn't require the investment to actually 'work'. Buying servers adds to GDP whether the servers produce anything useful or not.

The net contribution of GenAI to GDP through actual productivity gains is a different number. After accounting for trade flows in intellectual property and computing hardware, GenAI contributed roughly 0.4 percentage points to US GDP in the first half of 2025. That's not nothing, but it's a long way from the headline 92 per cent figure, which measures investment rather than returns.

The NBER survey found 90 per cent of firms saw no measurable improvement. Goldman Sachs analysts, examining the relationship between AI adoption rates and output across the economy, found no meaningful correlation. This is the Solow Paradox updated for a new decade. In 1987, Robert Solow wrote: 'You can see the computer age everywhere except in the productivity statistics.' The same observation applies to AI in 2026, at least at the aggregate level.

Where AI productivity gains are measurable

The national-level picture obscures significant variation at the task level. Specific, well-scoped tasks show productivity gains that are neither small nor contested.

Coding is the clearest case. Controlled studies of GitHub Copilot have found developers completing tasks 55 per cent faster. That's a controlled-experiment number — real, but measured under conditions designed to isolate the tool's effect. The real-world picture from Faros AI is more complicated: a June 2025 study of over 10,000 developers across 1,255 engineering teams found that developers using AI tools took 19 per cent longer to complete individual tasks than those working without. The same study found that high-AI-adoption teams merged 98 per cent more pull requests and completed 21 per cent more total tasks. These findings aren't contradictory: AI assistance appears to shift work toward higher volume at the cost of per-task focus time.

Customer service shows a cleaner signal. A synthesis of seven separate field experiments conducted between late 2023 and mid-2024 found AI chatbots increased sales by 16 per cent. The setting matters: well-defined queries, fast quality verification, minimal workflow reorganisation required.

Radiology is another domain with measurable results. A multi-reader study found AI assistance reduced chest X-ray interpretation time by roughly 36 per cent, with specificity increasing by 11 percentage points. Medical imaging has properties that make it a strong case for AI: the input is structured, the output is verifiable, and the expertise distribution is uneven enough that AI assistance can add genuine value even in experienced hands.

The pattern across these three domains: productivity gains are largest when the task is well-defined, quality verification is fast, and the tool fits into an existing workflow without requiring reorganisation. When those conditions change, the gains shrink.

Domain	Controlled gain	Real-world result	Conditions that matter
Software development	55% faster coding (Copilot)	19% slower per task; 98% more PRs merged (Faros AI, n=10k+)	Fast iteration loops; gains show in volume, not speed
Customer service	No controlled study	+16% sales across 7 field experiments	Well-defined query types; fast feedback loop
Radiology	36% faster X-ray reads	No large-scale field study	Structured input; verifiable output; expertise distribution

AI productivity gains: controlled vs real-world

The bottleneck problem

Making a non-bottleneck step faster doesn't improve the throughput of the system. A developer who codes 55 per cent faster still faces code review, testing, deployment pipelines, product decisions, and stakeholder sign-off. If any of those steps is the actual constraint on shipping software, the coding speedup doesn't translate into shipping faster — it means the developer waits longer at the constraint.

This isn't a failure of AI; it's a structural feature of how productivity in complex organisations works. The same dynamic played out with electrification. By 1910, most US manufacturing plants had converted to electric power. Manufacturing productivity didn't surge until the 1920s and 1930s — because the gains required physically reorganising factories around electric motors rather than simply swapping a steam engine for an electric one. Factories designed around centralised power transmission couldn't capture the distributed flexibility that electricity made possible until someone redesigned the factory floor.

AI adoption is roughly at the 1910 stage in most organisations. The tools are in place. The reorganisation mostly hasn't happened.

What reorganisation looks like in practice: redesigning workflows so AI assistance is applied at the actual bottleneck, not around it. Reducing verification overhead for AI-assisted outputs. Shifting human attention from generating content to reviewing and directing it. This takes months, not days, and carries short-term productivity costs before the gains arrive. The 90 per cent of firms reporting no improvement are likely still paying those reorganisation costs.

“AI adoption is at roughly the 1910 stage of electrification. The tools are in place. The factory-floor reorganisation mostly hasn't happened.”

— FlowVerify Editorial

What the 2025 US productivity data actually shows

US nonfarm business productivity grew 4.9 per cent in the third quarter of 2025, according to the Bureau of Labour Statistics. The second quarter was revised to 4.1 per cent. Unit labour costs declined in both quarters — a pattern not seen since 2019. US aggregate productivity grew roughly 2.7 per cent across 2025, nearly double the prior decade's average.

Stanford economist Erik Brynjolfsson, who developed the productivity J-curve framework, interprets this as the beginning of a harvest phase. The J-curve argument: general-purpose technologies generate a period of apparent stagnation while organisations learn to reorganise around them, followed by a step-change in productivity once the reorganisation is complete. He argues the 2025 data shows the curve inflecting.

MIT economist Daron Acemoglu is more cautious. His work points out that the tasks where AI shows large productivity effects in experiments cover a relatively narrow slice of the total task distribution — domains with fast quality verification and well-specified outputs. Many occupations involve tasks with high verification costs or ambiguous success criteria, where AI assistance doesn't generate comparable gains.

One year of strong productivity data doesn't settle the debate. The 2025 numbers are consistent with both the J-curve story and the alternative: that 2025 was partly cyclical, partly a lagged effect of post-pandemic labour market normalisation, and that the AI contribution was smaller than Brynjolfsson estimates. Two or three more years of comparable growth would be more informative.

One data point worth keeping in mind: AI adoption among US firms using AI to produce goods rose from 3.7 per cent in late 2023 to 10 per cent by September 2025. The gains from that cohort will show up in aggregate data on a lag. If the adoption-to-gain ratio from early movers is representative, 2026 and 2027 productivity data should be more informative than 2025 data about what AI's actual structural contribution looks like.

Three things worth watching

Given the ambiguity in the current data, three indicators are worth tracking more carefully than the headline AI adoption numbers.

Adoption quality, not adoption rate. The share of US firms using AI to produce goods rose from 3.7 to 10 per cent in two years. What that number doesn't capture is whether firms are using AI to assist individuals at the margin, or whether they have genuinely redesigned workflows around the tool. The former generates good Copilot testimonials; the latter generates firm-level productivity gains. Survey data that distinguishes between these will matter more in 2026 than the headline adoption percentage.
Unit labour costs across several more quarters. The Q2 and Q3 2025 decline is suggestive but not yet a trend. If unit labour costs continue declining in 2026 while AI adoption continues rising, that's the clearest macro signal that something structural is happening — and not merely that 2025 had a favourable labour market.
Sector-level data rather than economy-wide averages. AI's effects are concentrated, not uniform. Software, customer service, and medical imaging have already produced measurable results. Legal document review and financial analysis are plausible next high-concentration sectors, given the task structure and verification characteristics. Economy-wide averages will show a muted signal for years because most sectors are still at the 'individual Copilot subscriptions' stage.

The national productivity statistics are a lagging, averaged signal. By the time AI's contribution is unambiguous in the BLS data, which firms and sectors are capturing gains and why will already be visible in more granular sources.

What this means for practitioners

The productivity paradox of AI is not that AI doesn't work. It's that task-level performance improvements, real and in some domains substantial, take longer to translate into firm-level productivity gains than most organisations expect, because those gains require workflow reorganisation rather than just tool adoption.

The interesting question to ask inside an organisation isn't 'are we using AI?' — it's 'have we changed any workflows around AI, or have we just given individuals a faster typewriter?' The answer to that second question predicts which side of the 90/10 split your organisation ends up on.

Frequently asked questions

Outcome-based AI pricing charges per resolution. Vendors decide what a resolution is.

Jul 10, 2026Read full article →

Industry AnalysisMay 30, 20267 min readReviewed May 30, 2026

The AI productivity paradox is more interesting than either side admits

Task-level AI gains are measurable. Firm-level gains are scattered. Economy-wide gains are barely visible. The gap between them is the story.

By FlowVerify Editorial Team

Key takeaways

92% of US H1 2025 GDP growth came from AI investment spend, not productivity gains; the net GenAI contribution to output was 0.4 percentage points.
Task-level gains are real: coding 55% faster in controlled studies, X-ray reading 36% faster, customer service sales up 16% across 7 field experiments.
90% of firms in a 2026 NBER survey of ~6,000 CEOs/CFOs reported no measurable productivity improvement from AI.
The bottleneck problem: AI speeds up individual tasks, but if those tasks aren't the constraint on output, the firm sees no system-level gain.
US factories went electric by 1910 but didn't see productivity gains until the 1920s, after redesigning the factory floor. AI follows the same pattern.
Three things worth tracking: adoption quality vs rate, unit labour costs over multiple quarters, and sector-level rather than economy-wide data.

The AI productivity paradox, quantified

Where AI productivity gains are measurable

The national-level picture obscures significant variation at the task level. Specific, well-scoped tasks show productivity gains that are neither small nor contested.

Domain	Controlled gain	Real-world result	Conditions that matter
Software development	55% faster coding (Copilot)	19% slower per task; 98% more PRs merged (Faros AI, n=10k+)	Fast iteration loops; gains show in volume, not speed
Customer service	No controlled study	+16% sales across 7 field experiments	Well-defined query types; fast feedback loop
Radiology	36% faster X-ray reads	No large-scale field study	Structured input; verifiable output; expertise distribution

AI productivity gains: controlled vs real-world

The bottleneck problem

AI adoption is roughly at the 1910 stage in most organisations. The tools are in place. The reorganisation mostly hasn't happened.

“AI adoption is at roughly the 1910 stage of electrification. The tools are in place. The factory-floor reorganisation mostly hasn't happened.”

— FlowVerify Editorial

What the 2025 US productivity data actually shows

Three things worth watching

Given the ambiguity in the current data, three indicators are worth tracking more carefully than the headline AI adoption numbers.

Adoption quality, not adoption rate. The share of US firms using AI to produce goods rose from 3.7 to 10 per cent in two years. What that number doesn't capture is whether firms are using AI to assist individuals at the margin, or whether they have genuinely redesigned workflows around the tool. The former generates good Copilot testimonials; the latter generates firm-level productivity gains. Survey data that distinguishes between these will matter more in 2026 than the headline adoption percentage.
Unit labour costs across several more quarters. The Q2 and Q3 2025 decline is suggestive but not yet a trend. If unit labour costs continue declining in 2026 while AI adoption continues rising, that's the clearest macro signal that something structural is happening — and not merely that 2025 had a favourable labour market.
Sector-level data rather than economy-wide averages. AI's effects are concentrated, not uniform. Software, customer service, and medical imaging have already produced measurable results. Legal document review and financial analysis are plausible next high-concentration sectors, given the task structure and verification characteristics. Economy-wide averages will show a muted signal for years because most sectors are still at the 'individual Copilot subscriptions' stage.

The AI productivity paradox is more interesting than either side admits

The AI productivity paradox, quantified

Where AI productivity gains are measurable

The bottleneck problem

What the 2025 US productivity data actually shows

Three things worth watching

What this means for practitioners

Frequently asked questions

Related reading

Outcome-based AI pricing charges per resolution. Vendors decide what a resolution is.

Microsoft's seven new MAI models make a lot more sense once you read the OpenAI contract behind them

$662 billion in AI data-center leases isn't on any balance sheet yet

Stay ahead on eSignatures, compliance, and document workflows

Outcome-based AI pricing charges per resolution. Vendors decide what a resolution is.

The AI productivity paradox is more interesting than either side admits

The AI productivity paradox, quantified

Where AI productivity gains are measurable

The bottleneck problem

What the 2025 US productivity data actually shows

Three things worth watching

What this means for practitioners

Frequently asked questions

Related reading

Outcome-based AI pricing charges per resolution. Vendors decide what a resolution is.

Microsoft's seven new MAI models make a lot more sense once you read the OpenAI contract behind them

$662 billion in AI data-center leases isn't on any balance sheet yet

Stay ahead on eSignatures, compliance, and document workflows

Outcome-based AI pricing charges per resolution. Vendors decide what a resolution is.