The AI wrapper debate, three years in: what actually makes a product defensible
Why the original critique was about the wrong thing, and what surviving products tell us about building on LLMs
Three years ago, the AI product landscape sorted into two camps: the true believers and the structural sceptics. The sceptics coined 'AI wrapper' as shorthand for a specific failure mode they expected to play out. The AI wrapper debate that followed was framed around moats: a thin layer over an API call has no moat; when OpenAI ships the feature natively, you're done.
Both camps were partly right and partly wrong, in ways that have become clearer as the evidence came in. The products that failed mostly did not fail because a competitor cloned them. They failed for a more basic reason: one that had nothing to do with how thick their software layer was.
The original AI wrapper critique was about the wrong thing
The wrapper framing focused on competitive structure: if your product is a layer over an API call, anyone can replicate it cheaply. That is true, but it misses the more important question: not whether the software layer is thin, but whether the value created is durable.
A thin software layer can create real value if it solves a problem the user cannot solve without it. A thick, elaborately engineered product can create none at all if users achieve the same result by pasting their text into ChatGPT. Architecture was never the point.
The more precise critique would have been: most AI products in 2023-2024 were built on the assumption that AI quality alone would be sufficient differentiation. When model quality levelled up across the industry and became cheap, that assumption was exposed. The products that failed were the ones where 'remove the AI and see what's left' had an uncomfortable answer.
What the failed AI products had in common
Most failed AI products were convenience tools. They made an existing workflow marginally faster, or produced content a user could have produced themselves given more time. A slightly better summariser. A document drafter with fewer formatting errors. An autocomplete that saved two keystrokes per sentence.
Convenience has real value, but it has a ceiling. Users compare the product against the free option (ChatGPT, Claude) and against their previous workflow. For convenience, the calculation rarely closes cleanly. The output has to be consistently and noticeably better, with less friction, at a price proportionate to the time saved.
The pattern that showed up repeatedly in churned AI products: strong initial adoption during the novelty window. Colleagues were impressed. The output was genuinely impressive. Then came cancellation once users ran the actual calculation: was the output reliably better than what they would get by writing a detailed prompt themselves? Often it was not.
Products built around specific, named tasks fared differently. Code review tooling that caught a specific class of bug. Contract analysis that flagged non-standard indemnification clauses. Support routing that used a company's own resolution history to triage incoming tickets. These tools were harder to replace with a paste into ChatGPT because they had context the user couldn't easily replicate.
Three structural properties that predicted survival
Looking at the AI products that have held their ground through 2025 and into 2026, three properties appear consistently in the ones that made it.
Workflow integration depth
The question is not whether the product uses AI: it's whether removing it requires significant re-engineering on the user's side. A browser extension that rewrites paragraphs on demand is disposable; the user can switch tools over a lunch break. A system that routes, classifies, and hands off work inside an existing operations stack creates real switching costs that have nothing to do with AI quality.
Proprietary context
Products that operate on data the user cannot easily provide to a general-purpose chat interface have a structural advantage. This includes organisational knowledge bases, real-time data access, transaction histories, and specialised domain corpora. The key question is whether the AI's output is materially better because of context the product holds, not because the model itself is superior.
A genuinely new interaction model
A small number of products created interaction patterns that were not feasible before AI made them possible: not 'the AI does the thing faster', but 'this thing did not exist as a product before.' Voice-to-structured-data pipelines at high accuracy. Multi-step reasoning over very long documents without losing thread. Real-time translation with contextual adaptation across industry-specific language. When a product creates a new verb rather than a faster version of an old one, defensibility is much higher.
| Product type | Primary value source | Typical outcome | Main risk |
|---|---|---|---|
| Convenience tool | AI output quality | Churned after novelty window | Free alternatives match on quality |
| Workflow-integrated | Switching cost | Durable enterprise retention | Slow initial adoption cycle |
| Proprietary-context | Data accumulation | Strong retention, compounding value | Slow to acquire proprietary data |
| New interaction model | Category creation | Winner-takes-most in niche | High R&D spend, unproven market |
Most failed products had none of these. Some had one, partially. The survivors tended to have at least two.
Why the obvious 'wrappers' did not die
The products most often cited as successful despite looking like wrappers (ChatGPT, Claude, Perplexity) are instructive. By the original moat argument, they should have been vulnerable. ChatGPT is OpenAI's own product, so the 'OpenAI ships it natively' risk does not apply. But Claude is essentially a polished interface to Anthropic's API. Why does it have a loyal user base?
Product execution accounts for some of it: conversation quality, the Projects feature, UX decisions that matter in a high-frequency tool. But the more structural answer is context accumulation. Conversation history, custom instructions, frequently accessed documents: these create switching costs that have nothing to do with the underlying model. The cost of leaving is rebuilding that context, not finding a better AI.
Perplexity is the cleaner case. It indexes the real-time web faster than a user can retrieve it manually, synthesises with citations, and creates a different interaction model for search: query, synthesised answer, traceable sources. That is closer to 'new interaction model' than 'wrapper', even though the underlying AI components are commercially available.
“Context accumulation is a flywheel. A product that gets better as a user uses it — not because the model improved, but because it has learned that user's patterns — has something a competitor cannot replicate overnight.”
What 'vertical' actually means when people say vertical agents won
'Vertical AI' has become a category label, but the underlying pattern is real. Products focused on specific, well-scoped problem domains have outperformed general-purpose assistants in enterprise adoption rates. The mechanism matters, though.
Vertical does not mean 'we sell to law firms' or 'we serve healthcare'. Distribution strategy is not the same as product differentiation. Vertical means the product has encoded enough domain knowledge and workflow specificity that it reliably outperforms what a user could achieve with a general tool plus manual effort.
A legal AI product that surfaces jurisdiction-specific precedents, flags clause risks against a specific agreement type, and integrates with the firm's document management system is genuinely vertical. A general summariser marketed to legal teams is not, even if the sales materials describe it as 'purpose-built for legal'.
The distinction matters for product decisions. Picking a target industry determines who you talk to. Building something that only works well for that industry because it encodes that industry's specific knowledge, workflow patterns, and data: that is the actual moat. The sales motion follows from the product, not the other way around.
The questions worth asking right now
The market has self-corrected in ways that favour builders with real answers. Enterprise buyers are more sceptical, more willing to churn after the novelty window, and more willing to pay for things that demonstrably improve specific workflows. That is a better environment for building something durable than the anything-goes period of 2023.
Three years in, the AI wrapper debate has mostly resolved itself not through argument but through evidence. The products that made specific claims about specific problems turned out to be right. The ones that assumed AI quality alone would carry them turned out to be wrong. That is not a complicated lesson, but it took a full product cycle to learn it.
Frequently asked questions
Related reading
The AI wrapper debate, three years in: what the survivors built
Three years after the GPT-4 wrapper wave, a handful of AI companies are thriving and most are gone. The split was not random — and the pattern tells you something useful about building on top of LLMs in 2026.
LLM database access: the RBAC gap most teams don't see
Giving an LLM access to your database is easy. The problem is that your application-layer RBAC is invisible when the model generates SQL. Here's where it goes wrong and how to fix it at the layer that enforces.
Most AI strategy decks are written backwards
AI features that start from technology instead of customer problems almost never stick. Here is how to tell the difference, and what a forward AI strategy actually looks like in practice.