How do I know if a feature flag is safe to delete?

If the flag has been at 100% on (or 100% off) for 30+ days and the person who last modified it confirms the rollout is permanent, it is a retirement candidate. Code removal and flag deletion are two separate steps — do both. Removing the conditional in code without deleting the flag from the service leaves a ghost entry that confuses the next person reading the dashboard.

Should we use a managed flag service or build our own?

For teams with more than five flags actively gated in production, managed services save more in operational overhead than they cost. The lifecycle problem exists regardless of which you use — the managed service does not enforce retirement discipline for you. It just gives you a nicer dashboard to show the accumulating debt.

What is the right number of active flags to have at any time?

There is no universal number. But if your team cannot list all active flags from memory, you have enough to need a cleanup system. The absolute count matters less than whether you know the type, owner, and intended retirement date of each one.

EngineeringMay 21, 20266 min readReviewed May 21, 2026

Feature flags in production: the lifecycle teams skip

Q: How do we clean up a flag that is evaluated in multiple services?

Coordinate the removal across all services and deploy them together where possible. Partial removal — where the flag is gone from one service but still live in two others — does not reduce ambiguity; it just makes the codebase inconsistent and leaves the flag alive in the service dashboard. Schedule the full sweep in a single sprint, assign one owner per service, and treat them as a batch.

Adding a flag takes five minutes. Retiring it takes five months. Not because the code is hard.

By FlowVerify Editorial Team

A feature flag takes five minutes to add and five months to remove. Not because the code is hard. A flag is a boolean check. The delay is organisational: nobody knows if the flag is still doing anything, whose job it is to find out, or what 'done' even means for a flag once its initial rollout finishes.

The result is flag debt: conditional branches for releases that shipped a year ago, A/B tests where someone forgot to pick a winner, kill switches nobody dares touch because they are not sure what breaks if flipped. Every active flag in your codebase is a branch every code reviewer has to mentally evaluate. Every incident investigation starts with 'was a flag involved?', and answering that question takes longer as the flag count climbs.

The feature flag lifecycle nobody draws on the whiteboard

The standard diagram for feature flags stops at a green box labelled 'flag is at 100%'. What happens after is left to drift. In practice, most flag lifecycles look like this: an engineer needs to ship something safely, adds a flag, rolls it out, moves to the next project, and never revisits the flag. It reaches 100%, stays there, and becomes invisible. Six months later, someone asks what it does. Nobody remembers. The flag stays forever.

The reason this keeps happening is not lack of discipline. Adding a flag is part of the delivery workflow. It shows up in code review, in CI checks, in the rollout runbook. Retiring a flag is not part of any workflow. There is no moment where 'this flag's job is done' gets formally acknowledged and acted on. The lifecycle ends at 'ship', not at 'clean up'.

Four types of flags, four different lifetimes

Most flag debt comes from treating all flags the same. There are four distinct types, each with a different intended lifetime and a different retirement trigger.

Release flags

Added to gate a new feature during rollout. Their job ends one to two weeks after the flag reaches 100% and the team has confirmed nothing is on fire. These are the easiest type to clean up and the most commonly neglected, because the team has already mentally declared victory and moved on by the time cleanup is due.

Experiment flags

Control which variant a user sees in an A/B or multivariate test. Their job ends when statistical significance is reached, or when the experiment's cutoff date passes, whichever comes first. The failure mode here is not neglect; it is forgetting to review the results and pick a winner. Experiments that 'run a bit longer' often run for a year.

Ops flags (kill switches)

Let an engineer disable a code path without a deployment. Unlike release and experiment flags, ops flags are intentionally long-lived. They exist specifically so the team can respond to production incidents faster than a deploy cycle allows. Cleaning these up on the same schedule as release flags is a mistake, and a dangerous one.

Permission flags (entitlement gates)

Control access based on plan tier, feature entitlement, or user segment. These can live for years and often become load-bearing business logic disguised as flags. The retirement trigger for a permission flag is a product decision, not a technical one.

Type	Intended lifetime	Staleness signal	Retirement trigger
Release	1-2 weeks post-100%	100% on for 14+ days, no targeting changes	Rollout confirmed stable; remove conditional
Experiment	To cutoff date or stat significance	Past cutoff; no winner declared	Pick winning variant; delete flag
Ops / kill switch	Indefinite	N/A; review quarterly	Intentional decision after review
Permission / entitlement	Business-driven	Zero active users match targeting rules	Product decision to retire tier or feature

Flag types at a glance

How flag debt compounds

Flag debt is not just code bloat. Three things make it worse than it looks from the outside.

Code review overhead. Every flag in the codebase is a branch every reviewer has to reason about. In a codebase with 50 active flags, that is 50 implicit questions during each review: is this still needed? What is the current rollout state? Does this path matter for the change I am looking at?
Test matrix expansion. Each flag doubles the code paths that tests should cover. In practice, teams do not test every combination, so flag debt accumulates as untested state: code paths that exist in production but have no coverage.
Incident overhead. In a production incident, the first question is whether a flag was recently changed. With 200 flags and no log of recent changes, answering that question takes 20 minutes. With a clear lifecycle and a small active flag count, it takes 30 seconds.

The less obvious compounding: flag debt clusters. Teams that do not retire flags tend to add more flags to work around the ambiguity of the existing ones. 'We cannot change how feature X behaves because we do not know if flag Y is still in use' leads to flag Z, which gates the new behaviour without touching the uncertain old one. Each generation of flags makes the next harder to clean up.

The staleness signal already in your flag service

Most flag services expose evaluation data via API: when the flag was last evaluated, its current targeting distribution, and when it was last modified. LaunchDarkly, Split, and Flagsmith all have endpoints for this. That is everything needed to detect stale release flags automatically.

A flag that has been 100% on (or 100% off) for 30 or more days without a targeting change is a strong retirement candidate. The service already knows this. The missing piece is a cron job that queries for it and routes the results somewhere actionable.

detect_stale_flags.py

import httpx
from datetime import datetime, timedelta, timezone

STALE_DAYS = 30
LD_API_KEY  = "api-..."
PROJECT_KEY = "your-project"
ENV_KEY     = "production"

def find_stale_release_flags() -> list[dict]:
    resp = httpx.get(
        f"https://app.launchdarkly.com/api/v2/flags/{PROJECT_KEY}",
        headers={"Authorization": LD_API_KEY},
        params={"env": ENV_KEY, "tag": "release"},  # tag your release flags
    )
    resp.raise_for_status()

    cutoff = datetime.now(timezone.utc) - timedelta(days=STALE_DAYS)
    stale  = []

    for flag in resp.json()["items"]:
        env           = flag["environments"].get(ENV_KEY, {})
        last_modified = datetime.fromisoformat(
            env.get("lastModified", "2000-01-01T00:00:00Z")
        )
        is_fully_on   = env.get("on", False)

        if is_fully_on and last_modified < cutoff:
            stale.append({
                "key":           flag["key"],
                "name":          flag["name"],
                "last_modified": env.get("lastModified"),
            })

    return stale

if __name__ == "__main__":
    for f in find_stale_release_flags():
        print(f"STALE: {f['key']}  (last modified: {f['last_modified']})")

For self-hosted flag services, the equivalent data is in their admin API: lastSeenAt and toggle state per environment are available in all major open-source options. The API shapes differ; the detection logic is identical. A weekly cron job piping output to a Slack channel costs a few hours to set up and replaces roughly one manual flag-debt audit per quarter.

The cleanup playbook

Identifying a stale flag is the easy part. Retiring it has six steps, and teams typically skip the last two.

Verify intent. Was this flag left at 100% deliberately, or did it drift there? Ask the last person who modified it. If they have left the company, check the commit history for the flag config.
Assign an owner. Flag debt has no natural owner once the original engineer moves on. Name one person responsible for the removal and hold them to the next step.
Set a deadline. 'Remove it when we have time' is never. 'Remove it by end of sprint' is a date.
Remove the conditional in code. Keep the winning behaviour; delete the losing branch and all flag-evaluation logic. Do not leave the losing variant behind 'just in case'.
Update tests. Remove test cases that explicitly exercise the 'off' variant, or that pass flag state in as a parameter. Leaving dead test branches intact is the same as leaving dead code; it just costs future test-run time rather than production overhead.
Delete the flag from the service. Not archive: delete. Archiving leaves a ghost in the dashboard that confuses the next person who searches for flags and adds the key to their planning.

The step that slows teams down most is step 4 in multi-service codebases. If a flag is evaluated in three services, removal means three separate code changes, three reviews, and three deploys. Schedule them together where possible. Partial removal, where the flag is gone from one service but still live in two others, does not reduce ambiguity; it just makes the codebase inconsistent and keeps the flag alive in the dashboard.

What 'done' actually looks like

A flag is done when four things are simultaneously true: the winning code path is deployed unconditionally in every service that evaluated the flag; tests pass for the winning path only, without flag-specific branches or parameterised flag state; the flag is deleted from the service (not archived); and the flag key is recorded in a list of retired keys so the name cannot be accidentally reused.

That last point matters more than it sounds. Reusing a flag key that old config files or log parsers still reference is a subtle, hard-to-diagnose regression. Some teams keep a RETIRED_FLAGS constant in a shared module, appended to during each cleanup. A startup check that verifies no live flag key matches a retired one takes a few lines of code and has caught more than one configuration bug.

The metric worth tracking: median time from 'flag reaches 100%' to 'flag is deleted from service'. For release flags, a healthy median is under 30 days. If that number is climbing, or if you have never measured it because you do not know how many active flags you currently have, the lifecycle system is the gap. A different flag service does not fix a missing lifecycle; it just gives you a nicer dashboard to show the accumulating debt.

Frequently asked questions

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Jul 5, 2026Read full article →

EngineeringMay 21, 20266 min readReviewed May 21, 2026

Feature flags in production: the lifecycle teams skip

Adding a flag takes five minutes. Retiring it takes five months. Not because the code is hard.

By FlowVerify Editorial Team

The feature flag lifecycle nobody draws on the whiteboard

Four types of flags, four different lifetimes

Most flag debt comes from treating all flags the same. There are four distinct types, each with a different intended lifetime and a different retirement trigger.

Release flags

Experiment flags

Ops flags (kill switches)

Permission flags (entitlement gates)

Type	Intended lifetime	Staleness signal	Retirement trigger
Release	1-2 weeks post-100%	100% on for 14+ days, no targeting changes	Rollout confirmed stable; remove conditional
Experiment	To cutoff date or stat significance	Past cutoff; no winner declared	Pick winning variant; delete flag
Ops / kill switch	Indefinite	N/A; review quarterly	Intentional decision after review
Permission / entitlement	Business-driven	Zero active users match targeting rules	Product decision to retire tier or feature

Flag types at a glance

How flag debt compounds

Flag debt is not just code bloat. Three things make it worse than it looks from the outside.

Code review overhead. Every flag in the codebase is a branch every reviewer has to reason about. In a codebase with 50 active flags, that is 50 implicit questions during each review: is this still needed? What is the current rollout state? Does this path matter for the change I am looking at?
Test matrix expansion. Each flag doubles the code paths that tests should cover. In practice, teams do not test every combination, so flag debt accumulates as untested state: code paths that exist in production but have no coverage.
Incident overhead. In a production incident, the first question is whether a flag was recently changed. With 200 flags and no log of recent changes, answering that question takes 20 minutes. With a clear lifecycle and a small active flag count, it takes 30 seconds.

The staleness signal already in your flag service

detect_stale_flags.py

import httpx
from datetime import datetime, timedelta, timezone

STALE_DAYS = 30
LD_API_KEY  = "api-..."
PROJECT_KEY = "your-project"
ENV_KEY     = "production"

def find_stale_release_flags() -> list[dict]:
    resp = httpx.get(
        f"https://app.launchdarkly.com/api/v2/flags/{PROJECT_KEY}",
        headers={"Authorization": LD_API_KEY},
        params={"env": ENV_KEY, "tag": "release"},  # tag your release flags
    )
    resp.raise_for_status()

    cutoff = datetime.now(timezone.utc) - timedelta(days=STALE_DAYS)
    stale  = []

    for flag in resp.json()["items"]:
        env           = flag["environments"].get(ENV_KEY, {})
        last_modified = datetime.fromisoformat(
            env.get("lastModified", "2000-01-01T00:00:00Z")
        )
        is_fully_on   = env.get("on", False)

        if is_fully_on and last_modified < cutoff:
            stale.append({
                "key":           flag["key"],
                "name":          flag["name"],
                "last_modified": env.get("lastModified"),
            })

    return stale

if __name__ == "__main__":
    for f in find_stale_release_flags():
        print(f"STALE: {f['key']}  (last modified: {f['last_modified']})")

The cleanup playbook

Identifying a stale flag is the easy part. Retiring it has six steps, and teams typically skip the last two.

Verify intent. Was this flag left at 100% deliberately, or did it drift there? Ask the last person who modified it. If they have left the company, check the commit history for the flag config.
Assign an owner. Flag debt has no natural owner once the original engineer moves on. Name one person responsible for the removal and hold them to the next step.
Set a deadline. 'Remove it when we have time' is never. 'Remove it by end of sprint' is a date.
Remove the conditional in code. Keep the winning behaviour; delete the losing branch and all flag-evaluation logic. Do not leave the losing variant behind 'just in case'.
Update tests. Remove test cases that explicitly exercise the 'off' variant, or that pass flag state in as a parameter. Leaving dead test branches intact is the same as leaving dead code; it just costs future test-run time rather than production overhead.
Delete the flag from the service. Not archive: delete. Archiving leaves a ghost in the dashboard that confuses the next person who searches for flags and adds the key to their planning.

Feature flags in production: the lifecycle teams skip

The feature flag lifecycle nobody draws on the whiteboard

Four types of flags, four different lifetimes

Release flags

Experiment flags

Ops flags (kill switches)

Permission flags (entitlement gates)

How flag debt compounds

The staleness signal already in your flag service

The cleanup playbook

What 'done' actually looks like

Frequently asked questions

Related reading

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Vercel vs Cloudflare Pages vs Netlify: the bill that shows up during a traffic spike

pgvector's HNSW index has a memory cliff, and the Postgres defaults walk right into it

Stay ahead on eSignatures, compliance, and document workflows

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Feature flags in production: the lifecycle teams skip

The feature flag lifecycle nobody draws on the whiteboard

Four types of flags, four different lifetimes

Release flags

Experiment flags

Ops flags (kill switches)

Permission flags (entitlement gates)

How flag debt compounds

The staleness signal already in your flag service

The cleanup playbook

What 'done' actually looks like

Frequently asked questions

Related reading

Railway disconnected a carrier to contain an outage. It cut its last route instead.

Vercel vs Cloudflare Pages vs Netlify: the bill that shows up during a traffic spike

pgvector's HNSW index has a memory cliff, and the Postgres defaults walk right into it

Stay ahead on eSignatures, compliance, and document workflows

Railway disconnected a carrier to contain an outage. It cut its last route instead.