What YC W26’s 199 startups reveal about which AI bets are defensible
The W26 batch is a vote. Here's what it voted for.
In March 2026, Y Combinator's W26 batch graduated 199 companies. The industry duly noted that roughly 60 percent are building AI products. The more useful thing to do is read the breakdown.
YC W26's headline number is 60 percent AI. The more useful number is what type.
Extruct AI's analysis of the full 199-company batch puts the breakdown at roughly 56 AI-native services, 45 AI-enhanced software companies, 34 developer infrastructure plays, 20 hardware startups, and 3 foundational AI research labs. Read those numbers carefully: 34 companies are building tools for AI products to run on, not building AI products for end users. Add hardware and research labs and nearly three in ten companies in the batch are betting on layers below the application.
That distribution is different from W23 and W24, where almost all the AI bets were at the application layer — a chat interface, a system prompt, a specific domain context, a subscription. The infrastructure layer barely existed in those batches. In W26, it's a meaningful fraction of the batch.
The batch is also the most hardware-heavy in recent YC history: about one in eight companies is building something physical: robots, drones, wearables, space hardware, biotech. AI as the application layer for physical systems is a distinct thesis from AI as software, and W26 is one of the first batches where that thesis shows up at significant scale.
Infrastructure first: agents need somewhere to run, and nobody built it yet
The developer infrastructure bets in W26 cluster into four sub-categories: deployment and hosting for agents, data pipelines and retrieval, reasoning verification, and observability tooling. They share a common thesis: agentic applications are arriving faster than the plumbing is being built, and the plumbing layer captures a durable position.
Terminal Use is building hosting and deployment infrastructure specifically for agents that run in the background. The problem is concrete: standard serverless functions time out in seconds or minutes. A background agent running a two-hour due diligence workflow doesn't fit any existing hosting primitive cleanly. Terminal Use's bet is that background agents are a new compute category that needs its own infrastructure layer.
Captain is building an API-first unified data layer for enterprise RAG pipelines. Its pitch: standing up a production-quality retrieval system still takes weeks of glue code, including embedding models, vector databases, chunking strategies, and reranking layers. Captain collapses that to a single API. Garry Tan personally coached the Captain team during the batch, which is a reasonably rare signal from a YC CEO.
Rubric is building reasoning and verification infrastructure for agents: tooling that checks at inference time whether an agent's output is consistent with a given specification. Not post-hoc evaluation. Inline verification. The problem it addresses is that agents fail silently, produce confidently wrong outputs, and by the time a human notices, the bad output has already fed three downstream steps.
None of these are AI applications. They're scaffolding. The bet is that as agentic workloads scale through 2026-2027, the scaffolding layer is where sustainable margin concentrates, less tied to which foundation model wins the next benchmark and more tied to the volume of agents running on top.
| Archetype | Rough share of batch | Moat type | Tail risk |
|---|---|---|---|
| Agent infrastructure | ~17% (34 of 199) | Switching cost; volume dependency | Cloud providers build the same layer |
| Vertical workflow agents | ~28% (56 AI-native) | Vertical data; workflow lock-in | Foundation model gets good enough natively |
| Research / hardware | ~12% (23 of 199) | Proprietary models or techniques | Long timeline to revenue |
Vertical agents: own the task, not the assistant
The largest cluster of AI companies in W26 is vertical agents: products that attempt to replace a human function in a specific workflow rather than assist a human who remains in the loop. The distinction matters more than it sounds. An AI that assists a human saves time. An AI that owns the task changes the headcount equation.
Healthcare is the densest vertical, at roughly 10 percent of the batch. Patientdesk is building AI for the front and back office of dental practices. Beacon Health is positioning as AI employees for primary care. ClaimGlide is automating prior authorisations, the laborious process of getting insurer approval before treatment. Overdrive Health is AI-native medical billing. Each is solving a specific staff function, not adding a feature to existing software.
Legal nearly tripled its share of the batch: from about 1.3 percent in earlier cohorts to 4.0 percent in W26. The driver is legible. Harvey's reported revenue trajectory demonstrated that law firms pay meaningful money for AI that performs legal work, not just assists with legal research. General Legal is positioning as an AI-native law firm with same-day turnaround on startup legal matters. Vector Legal and Legalos are targeting similar territory. These companies describe themselves as law firms, not as software for law firms. The framing difference is the entire product thesis.
Supply chain grew from 4.3 percent of the batch to 8.5 percent. Pollinate is one of the companies in this space, automating supply chain operations. The underlying logic is consistent across all three verticals: large volumes of structured data, routine decisions under time pressure, and expensive human time. Agents outperform on throughput in those conditions even without matching human judgement on edge cases.
The pattern across verticals: the companies generating attention in W26 are the ones claiming to replace a human function, not the ones adding an AI feature to existing vertical SaaS. The implicit argument is that the first-mover advantage belongs to whoever rebuilds the workflow from scratch with an agent as the primary actor, not to whoever bolts AI onto software designed for humans first.
The research bets: why is YC funding AGI labs?
Foundational AI research has historically sat outside YC's wheelhouse. The fund's model is built on short iteration cycles and revenue within 12-18 months of inception. Basic research doesn't produce revenue on that timeline. The presence of Ndea, Confluence Labs, and Rubric in W26 is a departure.
Ndea raised $43M before the batch and is co-founded by Francois Chollet — the creator of Keras (the most widely used deep learning framework for most of the 2010s) and the designer of the ARC-AGI benchmark. Chollet's thesis at Ndea is that scale alone doesn't produce general intelligence; program synthesis combined with learning is the more promising path. That's a direct challenge to the prevailing consensus in the field.
Confluence Labs scored 97.9 percent on ARC-AGI-2 before the batch's Demo Day. Their approach uses foundation models to write code that describes the transformation in each ARC problem, then verifies it. The approach doesn't claim to have built AGI but does claim to have found a reliable path through the benchmark's specific task class. At roughly $12 per task, it's not cheap, but it's a working result.
The two timelines these bets imply are different in kind. Vertical agent companies are betting that current models are good enough. You just need to wrap them in the right workflow with the right data. Research labs are betting that current models are not good enough, and that genuine capability advancement is needed before the application layer can deliver on its promises at scale. Both bets can be right simultaneously; they operate on different clocks.
What the batch doesn't show
There is a category largely absent from the companies analysts highlighted as notable in W26: thin productivity wrappers. A chat interface, a system prompt, a specific domain context, an annual subscription.
This was a meaningful fraction of the W23 and W24 AI companies. Teams would pick a vertical, wrap a foundation model with relevant context, and sell access. Early MRR was easy because users paid for convenience and context they didn't want to configure themselves.
In W26, that pattern is present but not prominent. The companies generating attention are the ones doing something the base model can't replicate: deep workflow integration, proprietary vertical data, infrastructure at a layer the model doesn't touch. YC's selection criteria appear to have updated. Strong early MRR from a wrapped model is no longer sufficient evidence of defensibility. The implicit question is whether the company has something Anthropic or OpenAI couldn't replicate by shipping a new feature.
“The companies generating attention in W26 are the ones doing something the base model can't replicate in a weekend.”
Three questions this batch asks about your own AI strategy
Does your product own the task or assist it? The vertical companies in W26 aren't building AI features. They're replacing a human's role in a specific workflow. An AI product that helps a user do something they'd do anyway without AI has a more competitive position than one where the human is optional — but it's also harder to build and harder to sell. The batch is a vote for the harder path.
Where do you sit in the stack? Infrastructure companies in W26 are making a specific durability bet: if enough agents run on their plumbing, they survive regardless of which foundation model wins the next round of benchmarks. Application companies are betting on workflow lock-in. Both can work. The risk profiles and moat types are fundamentally different, and the infrastructure bet has a longer payback period.
What does your product do in 18 months that the model won't do natively? Features like 'summarise this document' or 'answer questions about your codebase' are being commoditised not by better wrappers but by the underlying models improving. The companies worth watching in W26 all have a credible answer to this question. Most of them answer it with workflow integration, vertical data, or infrastructure positioning — not with a better prompt.
The S26 batch is still running. Demo Day is in September 2026. It will be another data point on the same trend. The W26 composition is already a clear enough signal: the application layer is crowded and getting more so, the infrastructure layer is under-built relative to demand, and vertical workflow ownership is the most defensible application bet available right now.
Frequently asked questions
Related reading
ONDC in 2026: open protocol, closed liquidity
The programme set out to do for Indian commerce what UPI did for payments. Four years on, the headline metrics hide more than they reveal, and the honest story may not be about retail at all.
Your early-stage GTM worked. That’s exactly why it’s breaking now.
Most B2B SaaS growth plateaus around $3–5M ARR are not caused by bad luck or the wrong hires. They are caused by the three habits that closed the first 50 customers — still running at full speed.
Klarna replaced 700 agents with AI in customer service. Here is what the metrics missed.
Klarna's AI handled two-thirds of all customer chats and the efficiency metrics looked clean. CSAT on complex interactions told a different story. Here is the diagnostic and the hybrid model that holds up.