AI made your developers faster. Why hasn't software delivery caught up?
The data on individual throughput, system throughput, and the three structural changes that close the gap
Two different answers to the same question
Ask a developer if AI coding tools have made them more productive, and you will almost certainly get a yes. Ask their engineering director if sprint velocity has improved, and the answer gets murkier.
This is not a morale problem, and it is not a measurement problem. It is a systems problem. The data from the past 12 months is specific enough to explain exactly where the gap comes from.
What the AI developer productivity data actually shows
A Faros.ai study tracking 5,000+ developers across enterprise engineering teams found that high-adoption cohorts (teams where over 70% of developers had active AI assistant sessions daily) completed 21% more tasks per sprint and merged pull requests at nearly twice the rate of control groups. These numbers are controlled for team size and project type. They replicate across organisations.
Developer satisfaction data lines up with this. Engineers using AI coding assistants consistently report finishing features faster, spending less time on boilerplate, and feeling less blocked on routine implementation. The individual experience is genuinely better.
So the tools are working. The question is where the output goes.
Where the time went: Amdahl's Law meets the review queue
The same Faros.ai dataset showed that pull request review time went up by 91% on high-AI-adoption teams.
When code is produced faster, it arrives faster in the review queue. The review queue did not scale with the throughput. Code review, which had been a manageable constraint in the pre-AI pipeline, became the obvious bottleneck once the generation phase sped up, and code review itself did not change.
Gene Amdahl's 1967 formulation for parallel computing says that the speedup of a system is limited by the fraction that cannot be parallelised. The software delivery version: if coding accounts for 20-30% of end-to-end cycle time, making coding twice as fast only improves total cycle time by 10-15%. Review, QA, staging, deployment, and stakeholder sign-off make up the remaining 70-80%. Speeding up the first fraction faster than the system can absorb it reveals the second fraction.
Sprint velocity has been sticky not because AI tools are underperforming, but because the constraint moved from the generation phase to the review phase. Both things are simultaneously true.
| Metric | Observed change | What it points to |
|---|---|---|
| Individual task completion rate | +21% on high-adoption teams | AI genuinely accelerates individual coding |
| PR merge rate | Nearly doubled (+98%) | More code produced per developer per day |
| PR review time | +91% on the same teams | Downstream bottleneck exposed, not eliminated |
| Verification and rework overhead | ~40% of raw AI gains consumed | Checking AI output is real cognitive work |
| Org-level delivery cycle time | Mixed; often flat | Coding was not the bottleneck on delivery |
| CEO-reported business impact (PwC, 2026) | 56% report no gains | Faster code did not reach faster outcomes |
The verification tax
There is a second phenomenon layered on top of the review bottleneck. A January 2026 Workday analysis found that nearly 40% of AI-generated productivity gains were being consumed by verification and rework — the time developers spent checking, correcting, and second-guessing AI output before submitting it for review.
AI coding assistants produce plausible code reliably. They produce correct code probabilistically. The cognitive cost of reviewing your own AI-generated output — deciding what to keep, what to rewrite, where the model got something subtly wrong — is real and does not show up in task-completion counts.
A developer who merges twice as many PRs in a week may have spent a similar number of focused hours in front of code. Session count is up; verification overhead is new. The gross productivity gain is real. The net gain, after accounting for that overhead, is smaller and harder to attribute cleanly to the tooling.
What the macro data shows, and what it doesn't
At the organisational and economic level, the gap widens further.
The PwC 2026 Global CEO Survey of 4,454 CEOs across 95 countries found that 56% said they had gotten nothing from their AI investments. Only 12% reported AI had both grown revenues and reduced costs. Goldman Sachs economists found no measurable AI contribution to US GDP growth through 2025.
These figures sit awkwardly alongside the developer-level data. If developers are merging PRs at twice the rate, where is the business outcome?
The most coherent explanation is that coding throughput was not the constraint on software delivery, and software delivery was not the constraint on business outcomes for most organisations. Speeding up the coding phase faster than the rest of the system can absorb it does not produce faster products. It produces a longer backlog of code waiting to be reviewed, tested, and shipped.
It is worth noting that BLS benchmark revisions put US productivity growth at approximately 2.7% in 2025 — nearly double the prior decade average. This is the statistical signature of productivity growth. But it is accruing unevenly, and much of it is being absorbed by process friction before reaching business metrics.
“Coding throughput was not the constraint on software delivery. Speeding up a step that was not the bottleneck does not move the whole system.”
The three places gains actually compound
Based on what the data shows about where cycle time actually goes, there are three structural changes that shift AI coding gains from individual throughput to delivery throughput.
Smaller PRs, not larger ones
This is counterintuitive. AI tools generate more code per session, so the natural response is to let PRs grow. The teams that have captured the most organisational benefit from AI coding tools have done the opposite: they moved to smaller, more frequent PRs.
A 150-line PR reviewed in 25 minutes and merged the same day beats a 600-line PR that waits three days for a reviewer and another two days for a deployment slot. When AI tools generate more code per session, the unit of review needs to shrink, not stay the same. The throughput advantage compounds across weeks, not sprints.
Tests that authorise deployment, not humans
If AI-generated code needs a human to validate it before tests run, or if test coverage is sparse enough that a reviewer has to read the code carefully to feel confident — you have not automated the bottleneck. You have moved it inside the developer's session.
Teams getting the most from AI coding have invested in test coverage and test quality such that AI-generated code that passes the suite earns a degree of automated confidence before review. This is a significant investment in test infrastructure that needs to precede the AI tooling payoff; teams that skipped it are getting the gross gains while the verification tax eats the net.
CI/CD maturity that scales with the new throughput
Deployment, environment provisioning, QA triage: if these require human scheduling and approvals, faster code generation produces a longer queue of code waiting to be deployed rather than faster time to production. The teams seeing delivery-level improvements from AI coding are typically those that had invested in CI/CD maturity before or alongside their AI tooling rollout.
None of this is surprising in retrospect. Every time a step in the engineering pipeline speeds up significantly, the next step becomes the visible constraint. The thing that is different now is the scale of the speed-up in the generation phase, and how quickly it revealed what had been invisible constraints downstream.
The AI coding tools are not underperforming. The code review workflows, the staging environments, the approval processes, and the deployment pipelines are performing exactly as they always have. That used to be adequate. Now it is the bottleneck.
Frequently asked questions
Related reading
The AI wrapper debate, three years in: what the survivors built
Three years after the GPT-4 wrapper wave, a handful of AI companies are thriving and most are gone. The split was not random — and the pattern tells you something useful about building on top of LLMs in 2026.
The AI coding productivity data keeps contradicting itself. Here's why.
AI coding productivity studies swing between '26% faster' and '19% slower on real tasks'. Both are probably correct — for different things. Here's what the research actually measures and what you should track instead.
Open-source business models in 2026: what the dust finally settled on
Four of the biggest open-source relicensing controversies from 2023–24 now have enough distance to read clearly. The outcomes are not what either side predicted.