Does AI change what Senior engineers do, or mainly Staff and Principal?

Both, but the effect is stronger at Senior and above, where the previous differentiation was implementation complexity. Junior engineers who adapt — using AI to accelerate while building genuine domain understanding alongside it — will be fine. The bigger risk is at levels where the old differentiation was "can write harder code."

How should managers recalibrate if the written ladder has not changed?

The criteria do not need to be rewritten immediately, but the evidence that satisfies them does. "Demonstrates technical leadership" is still the right bar — what it looks like now is writing specs that constrain AI output, owning LLM system trade-offs, and catching the failure modes that AI missed. Calling those out explicitly in promotion documents is enough for now.

My team does not use AI coding tools heavily yet. Does this still apply?

If your team has not adopted AI tools, your rubric is fine for now. The shift is coming — adoption across engineering teams accelerated sharply in 2025. The teams where the rubric mismatch shows up most visibly are those that adopted quickly without updating how they calibrate seniority.

Is spec precision really a Staff-level skill, or just good communication?

Good communication is table stakes at every level. Spec precision is different: translating ambiguous product intent into constraints that a system, AI or human, can execute from without clarification requires deep product understanding and the technical ability to name the exact solution space you are trying to occupy. AI just made the gap between good and bad specs more immediately visible.

Hiring & CultureJun 9, 20267 min readReviewed Jun 9, 2026

What Staff engineers actually do in 2026 versus what the career ladder says they should

Most IC rubrics still measure the skills AI is making table stakes. The three gaps your career ladder does not cover.

By FlowVerify Editorial Team

Most IC career ladders have not changed much since 2019. They still describe Staff engineers the same way: leads technical direction, mentors junior engineers, reduces ambiguity, writes the hard code that nobody else can. The definition was accurate then. The problem is that several things on that list are now much easier to do, and the things that actually require Staff-level judgment in 2026 are not on it.

This is not a complaint about career ladders — they are hard to maintain, and most companies update them every three to five years at best. It is a practical observation: if you are being evaluated against criteria that no longer align with the actual high-value work in your organisation, you will optimise for the wrong things. And if you are a manager calibrating performance reviews, you may be measuring the wrong signals.

What IC ladders were designed to measure

The canonical Senior and Staff engineer rubric, written somewhere around 2015–2019 at most companies, assumes implementation is the bottleneck. The Staff engineer is valuable because she can write the complex distributed-systems code that junior and mid-level engineers cannot. She is the one who catches the subtle concurrency bug, who knows how the database behaves at 10x current load, who can explain the six-year-old legacy system to anyone who asks.

Ownership of a system meant understanding its internals deeply, because that understanding was hard to acquire. Technical leadership meant being the person who could execute on the hard things when others were blocked. The career criteria reflected that: ship complex work, lead technical projects, multiply your team's output.

These are real skills. The ladder was measuring something meaningful. The question is whether it is measuring the same thing today.

Where the actual work shifted

By mid-2025, AI coding tools had crossed a threshold. Not that they replaced engineers — that has not happened — but they changed which parts of engineering were scarce. Several activities that were previously Staff-level signals in many organisations:

Explaining a complex subsystem to a new hire, with specific examples and context
Writing a first draft of a design document from a product brief
Reviewing a pull request for common patterns and obvious issues
Writing the scaffolding for a new service: API client, repository layer, basic tests

AI handles the first two adequately most of the time. It catches a reasonable share of the third. It does the fourth entirely. Teams that adopted AI-assisted coding seriously in 2025 are asking a different question from "can we build this?" They are asking "is what we built actually right?"

Junior and mid-level engineers produce more code, faster. What they still lack is the judgment to know whether the code is solving the right problem, whether the abstraction will age well, whether the AI missed a class of input that will matter in production. That judgment is where Staff-level work has concentrated.

The seniority signal has not disappeared. It has moved. Code output used to separate Senior from Mid. Now the separator is something harder to name on a rubric: the judgment layer that sits above what AI can produce.

Three gaps that 2026 ladders miss

Three categories of work that create real Staff-level impact now, but do not appear in most rubrics.

Spec precision

The most underrated skill in an AI-heavy engineering team is writing a specification that produces the right output when fed to an AI system and knowing, from the output, when the spec was wrong versus when the AI was wrong.

This is adjacent to prompt engineering, but broader. It is the skill of translating ambiguous product intent into a precise-enough description of the problem that the solution space is meaningfully constrained. A junior engineer uses AI to write code. A Staff engineer writes the spec that gets AI-generated code into the right solution space in the first iteration, not the fifth.

Most IC rubrics do not measure this at all. There is "leads technical design," which is related. But leading a design review and writing a spec that another system can execute from without clarification are different skills, and most ladders do not distinguish them.

The feedback loop is fast and honest: if AI-generated code from your spec needs three rounds of major correction, the spec was not precise enough. That is a measurable output. Most teams are not yet measuring it.

AI system design and inference cost

A growing category of Staff-level work that most ladders have not absorbed: owning the architecture of systems where an LLM is in the critical path.

This includes designing retrieval pipelines that degrade gracefully when context windows fill, managing token budgets across inference calls without degrading output quality, and knowing when to add a cheaper re-ranker versus a more expensive context step. At production volumes, inference cost shows up on the P&L. Someone has to own that trade-off. In most organisations, nobody's career rubric names it.

At $10–30 per million tokens, a poorly designed inference pipeline running against a large model at volume will produce a meaningful cost overrun. The Staff engineer who owns that system owns both the quality and the cost — the same way a Staff engineer who owns a Postgres-backed service owns both the query latency and the disk spend.

Recognising confident-but-wrong output

AI systems fail in a specific way: they are confident and wrong. Not wrong in obvious ways — those get caught immediately. Wrong in the way that requires domain knowledge to spot: a technically valid answer that solves the wrong problem, makes an implicit assumption that is false in your context, or misses a constraint the model had no access to.

This failure mode did not exist in the old rubric because there was no system in the loop that failed that way. Code failed loudly, at compile time or at runtime. AI output fails silently, producing something that looks right, passes tests, and breaks in production for reasons the model could not have known.

Catching this reliably is a Staff-level skill. Most ladders do not describe it because the language for it does not yet exist in most rubric frameworks. But in teams that have been running AI-generated code in production for a year, the engineers who catch these failures before they ship are clearly doing something that matters at a level the rubric was not designed to capture.

Activity	Old rubric weight	2026 reality
Writing complex implementation code	Core Staff signal	Table stakes; AI-assisted at most seniority levels now
First-draft design documents	Staff-level work	First draft is commoditised; quality review is what matters
PR review for common patterns	Core signal	Routine patterns are cheap to catch; non-obvious domain errors are the signal
Writing precise, executable specs	Not measured explicitly	High value; determines AI output quality in the first iteration
LLM infrastructure and cost design	Not applicable	Significant; maps directly to P&L at production scale
Catching domain-context errors in AI output	Not applicable	High value; requires judgment that AI cannot supply

How the IC rubric maps to actual 2026 impact

How some teams are recalibrating

A few patterns from teams that have been running AI-heavy engineering long enough to have opinions.

The most direct: some teams have added explicit rubric criteria around "AI force multiplier" — not just "uses AI tools" (which any engineer does now) but "creates systems and specifications that multiply the output of others using AI." That is the closest thing to a functional Staff definition for 2026.

A second pattern: separating code contribution from engineering judgment more clearly in performance reviews. Good ladders always tried to do this, but as AI raises the floor on code output, the relative weight of judgment has to go up. Teams that have not reweighted are promoting people with high output but low architectural judgment. It looks fine until a system breaks in a way that is expensive to fix.

A third pattern, still rare: including LLM system design as an explicit competency at L5 and above. Most companies still treat this as a specialisation. That is likely to change as more organisations move LLMs into critical production paths and the cost shows up in ways that cannot be ignored or attributed to individual teams.

What none of these teams did was rewrite the whole ladder from scratch. The underlying values — reducing ambiguity, multiplying team output, technical leadership — are still right. What changed is the evidence that satisfies them.

“The rubric says "demonstrates technical leadership." In an AI-forward team in 2026, that phrase increasingly means: this person's judgment is what ensures we are building the right thing at the right quality.”

— FlowVerify Editorial Team

For ICs navigating this gap now

If you are working toward Staff or Principal and your company's rubric has not updated, the practical reality is this: the written criteria are not the only criteria you will be evaluated against. Your manager and skip-level are forming opinions about your impact in the world as it actually is, not the world the rubric describes.

Three things that create visible Staff-level impact in this environment:

Get in the habit of writing specs that others, AI or human, can execute from without needing clarification. If AI-generated code from your spec needs three rounds of major correction, the spec was not precise enough. That feedback loop is fast; use it.
Take ownership of one AI-in-production system, even a small one. Practical experience with inference cost, latency budgets, and quality degradation patterns puts you ahead of people who only use AI as a coding tool — and gives you something concrete to discuss in promotion conversations.
When AI output is confidently wrong in your domain, write up the failure. Not just the fix, but the pattern. "AI gets this class of problem wrong because X" is useful institutional knowledge, and writing it is exactly what Staff engineers are supposed to do.

The ladder will catch up. Most of them do, eventually. The question is whether you will be measured by the 2019 rubric or the 2026 reality when your next promotion conversation happens.

Frequently asked questions

Return-to-office mandates aren't a headcount policy. They're a seniority filter.

RTO mandates read like a policy that treats everyone the same. The attrition data says otherwise: it selects first for the engineers a team can least afford to lose.

Jul 15, 2026Read full article →

Hiring & CultureJun 9, 20267 min readReviewed Jun 9, 2026

What Staff engineers actually do in 2026 versus what the career ladder says they should

Most IC rubrics still measure the skills AI is making table stakes. The three gaps your career ladder does not cover.

By FlowVerify Editorial Team

What IC ladders were designed to measure

These are real skills. The ladder was measuring something meaningful. The question is whether it is measuring the same thing today.

Where the actual work shifted

Explaining a complex subsystem to a new hire, with specific examples and context
Writing a first draft of a design document from a product brief
Reviewing a pull request for common patterns and obvious issues
Writing the scaffolding for a new service: API client, repository layer, basic tests

Three gaps that 2026 ladders miss

Three categories of work that create real Staff-level impact now, but do not appear in most rubrics.

Spec precision

AI system design and inference cost

A growing category of Staff-level work that most ladders have not absorbed: owning the architecture of systems where an LLM is in the critical path.

Recognising confident-but-wrong output

Activity	Old rubric weight	2026 reality
Writing complex implementation code	Core Staff signal	Table stakes; AI-assisted at most seniority levels now
First-draft design documents	Staff-level work	First draft is commoditised; quality review is what matters
PR review for common patterns	Core signal	Routine patterns are cheap to catch; non-obvious domain errors are the signal
Writing precise, executable specs	Not measured explicitly	High value; determines AI output quality in the first iteration
LLM infrastructure and cost design	Not applicable	Significant; maps directly to P&L at production scale
Catching domain-context errors in AI output	Not applicable	High value; requires judgment that AI cannot supply

How the IC rubric maps to actual 2026 impact

How some teams are recalibrating

A few patterns from teams that have been running AI-heavy engineering long enough to have opinions.

“The rubric says "demonstrates technical leadership." In an AI-forward team in 2026, that phrase increasingly means: this person's judgment is what ensures we are building the right thing at the right quality.”

— FlowVerify Editorial Team

For ICs navigating this gap now

Three things that create visible Staff-level impact in this environment:

Get in the habit of writing specs that others, AI or human, can execute from without needing clarification. If AI-generated code from your spec needs three rounds of major correction, the spec was not precise enough. That feedback loop is fast; use it.
Take ownership of one AI-in-production system, even a small one. Practical experience with inference cost, latency budgets, and quality degradation patterns puts you ahead of people who only use AI as a coding tool — and gives you something concrete to discuss in promotion conversations.
When AI output is confidently wrong in your domain, write up the failure. Not just the fix, but the pattern. "AI gets this class of problem wrong because X" is useful institutional knowledge, and writing it is exactly what Staff engineers are supposed to do.

The ladder will catch up. Most of them do, eventually. The question is whether you will be measured by the 2019 rubric or the 2026 reality when your next promotion conversation happens.

What Staff engineers actually do in 2026 versus what the career ladder says they should

What IC ladders were designed to measure

Where the actual work shifted

Three gaps that 2026 ladders miss

Spec precision

AI system design and inference cost

Recognising confident-but-wrong output

How some teams are recalibrating

For ICs navigating this gap now

Frequently asked questions

Related reading

Return-to-office mandates aren't a headcount policy. They're a seniority filter.

Take-home coding assignments hit a 48% AI-cheating rate. Live coding fixes the wrong half of it.

Microsoft says AI is hollowing out junior engineers. The senior shortage lands in the early 2030s.

Stay ahead on eSignatures, compliance, and document workflows

Return-to-office mandates aren't a headcount policy. They're a seniority filter.

What Staff engineers actually do in 2026 versus what the career ladder says they should

What IC ladders were designed to measure

Where the actual work shifted

Three gaps that 2026 ladders miss

Spec precision

AI system design and inference cost

Recognising confident-but-wrong output

How some teams are recalibrating

For ICs navigating this gap now

Frequently asked questions

Related reading

Return-to-office mandates aren't a headcount policy. They're a seniority filter.

Take-home coding assignments hit a 48% AI-cheating rate. Live coding fixes the wrong half of it.

Microsoft says AI is hollowing out junior engineers. The senior shortage lands in the early 2030s.

Stay ahead on eSignatures, compliance, and document workflows

Return-to-office mandates aren't a headcount policy. They're a seniority filter.