Open-source licensing for engineers: a corporate codebase guide
The licence exception that catches most SaaS teams, how AI code generation changed the risk profile, and the five-step check before you add a dependency.
The last time you ran npm install at work, you probably skipped the licence check. So did the engineer who added an AGPL library to your SaaS backend in 2022, and nobody noticed until a due-diligence audit flagged it three years later. Black Duck's 2026 Open Source Security and Risk Analysis report found licence conflicts in 68% of commercial codebases it scanned — a figure that has risen steadily as AI coding tools produce code without provenance metadata. Open source licensing is not just a problem for the legal team. Legal is not reviewing every dependency you add. You are.
This is not a guide to becoming a software lawyer. It is a guide to the specific licence decisions you will face as an engineer in a corporate codebase: what each one means, the one licence type that catches most SaaS teams off guard, and the five-step check you can run in under five minutes before merging any new dependency.
The split that actually matters for day-to-day decisions
There are dozens of open-source licences, but for practical engineering decisions, the relevant distinction is between permissive and copyleft.
Permissive licences (MIT, Apache 2.0, BSD-2-Clause, BSD-3-Clause) let you use, modify, and redistribute the code, including in proprietary products. MIT and BSD require you to keep the copyright notice. Apache 2.0 adds a patent grant and explicit attribution requirements. For most engineers at most companies, any of these is fine to add without escalating to anyone.
Copyleft licences (GPL, LGPL, AGPL, MPL, EUPL) require that code released under them stays free. Use of a copyleft-licensed library can trigger an obligation to release related code under the same licence. The important detail: the obligation differs substantially across the copyleft family, and LGPL, GPL, and AGPL work differently in ways that matter specifically for SaaS backends.
| Licence | Family | Safe to add to SaaS backend? | Key restriction |
|---|---|---|---|
| MIT | Permissive | Yes | Keep copyright notice |
| Apache 2.0 | Permissive | Yes | Keep notice; patent grant applies |
| BSD 2/3-Clause | Permissive | Yes | Keep copyright notice |
| LGPL 2.1 / 3.0 | Weak copyleft | Usually yes (dynamic-link exception) | Modifications to the library itself must remain LGPL |
| MPL 2.0 | File copyleft | Yes, with care | Modified MPL files must remain MPL |
| GPL 2.0 / 3.0 | Copyleft | Possibly (no network provision) | Binary distribution triggers share-alike |
| AGPL 3.0 | Network copyleft | No, for user-facing SaaS | Network interaction triggers share-alike on your codebase |
| SSPL | Service copyleft | No | Infrastructure code must also be released |
The AGPL exception that catches SaaS teams
GPL requires you to share source code if you distribute the software. For a long time, SaaS companies read "distribute" as "ship a binary to a customer." If you run the code on a server and offer it as a service over HTTP, you are not distributing a binary — so the GPL share-alike clause does not trigger. Lawyers called this the ASP loophole.
AGPL exists specifically to close that loophole. Under AGPL, if users interact with your software over a network, you must publish the full source under AGPL. An AGPL library in your SaaS backend API means your entire application's source would need to be open-sourced to be compliant.
Some widely used tools carry AGPL or SSPL licences. MongoDB switched from AGPL to SSPL with version 4.0 in 2018; most corporate legal teams treat SSPL as copyleft-equivalent for SaaS purposes. Certain analytics libraries, graph databases, and GNU utilities are AGPL. The version matters — checking the current licence is not enough if you are evaluating a pinned older version.
The practical test: if a library is AGPL-licensed and lives in code that runs on a server handling user requests, stop before merging. This is not a decision to resolve in a code review.
What AI code generation changed
Before AI coding assistants, the risk profile was relatively contained: your code came from packages with machine-readable SPDX identifiers, and the residual risk was engineers manually copying snippets from Stack Overflow (which has its own CC BY-SA 4.0 ambiguity that has existed for years).
Two things shifted in 2024 and 2025. First, AI tools generate code that may contain fragments derived from GPL or AGPL training data. GitHub Copilot has a filter to block output that closely matches indexed copyleft works; other tools handle this inconsistently. The Doe v. GitHub lawsuit, settled in February 2026, produced guidance that "independent recreation" of code is generally defensible — but the standard is whether the output is substantially similar to a specific copyleft work, not whether the AI was trained on it.
Second, AI tools make it fast to reach for a package you would previously have written yourself in 20 minutes. That speed is genuine value, but it means the dependency surface is growing faster than most teams' licence-review capacity. Black Duck's 2026 report estimates that 17% of open-source components now enter corporate codebases through AI suggestions or copy-paste — outside the package manager, and therefore outside standard software composition analysis (SCA) tooling.
The practical response: enable whatever "block public code" or duplication-detection setting your AI tool provides. It will not catch everything, but it handles the obvious case. For longer AI-generated functions that look like they might have been lifted rather than generated, a similarity check against GitHub search is worth a minute of your time before you commit.
The five-minute check before adding a dependency
This is the workflow. It takes under five minutes and covers the cases that matter.
- Check the SPDX identifier. Most package managers surface this directly. npm: npm info <package> license. pip: pip show <package>. The package.json or setup.cfg often includes it. If the identifier is MIT, Apache-2.0, BSD-2-Clause, or BSD-3-Clause, proceed.
- If it is anything else, read the licence text. GPL-2.0, GPL-3.0, AGPL-3.0, LGPL-2.1, LGPL-3.0, MPL-2.0, EUPL-1.2, and CC BY-SA all carry copyleft obligations of varying strength. LGPL and MPL have weaker forms, but they still warrant a read.
- Ask whether this runs server-side and handles user requests. If yes, and the licence is AGPL, stop and escalate. Do not merge.
- Check the transitive dependency tree, not just the direct dependency. npm ls --all and pip-tree surface the full tree. Your company's SCA tooling (Snyk, FOSSA, Black Duck) does this at CI time if it is configured. The licence on the package you are adding tells you nothing about what that package pulls in.
- Check the version. Some packages have changed licences between major releases. A dependency pinned to 3.x might be MIT; the same package at 4.x might be AGPL. This is rare but it does happen, and upgrade PRs rarely include licence review.
If all five steps complete cleanly for a permissive licence, merge. If you hit a copyleft licence in step two and you are unsure how it applies to your specific usage, that is the point where you look up your company's open-source policy or ask legal. The five-minute check has flagged it; the resolution is now a question, not a solo decision.
The grey zone: snippets, forks, and vendored code
Package managers are the easy case. The harder cases are the ones that bypass them.
Stack Overflow content is licensed under CC BY-SA 4.0. The copyright community has generally treated short, functional snippets as not copyright-protectable — the expression of a five-line utility function is not unique enough to attract protection. The risk is low for a short snippet. It is not low for a 200-line algorithm you copied wholesale from an accepted answer and then adjusted variable names.
Internal forks are a specific risk. If your company has forked a GPL or AGPL library to add features and that fork lives in a private repository, sharing it between employees of the same legal entity typically does not constitute distribution under GPL. Deploying it in a SaaS product serving users is a different question, particularly for AGPL. The fork needs the same five-minute check as any external library.
Vendored dependencies — libraries checked directly into your repository rather than installed through a package manager — are invisible to most SCA tooling. If your codebase vendors any open-source code, run a periodic licence scan with a tool that handles directory scanning: licensee, FOSSA, or Scancode Toolkit all support this.
When to escalate and when to substitute
Not everything needs a lawyer. Sometimes the right answer is to use a different library.
Substitution is usually sufficient when: a minor utility has a copyleft-licensed version and a well-maintained permissive alternative exists; or the dependency is a devDependency or build tool that runs only on developer machines and CI, not in production code handling user requests. AGPL risk does not apply when the tool is not running server-side.
Escalation is warranted when: an AGPL-licensed library is in server-side code handling user requests; a GPL-licensed library has no clear linking exception and you are unsure whether your usage pattern triggers copyleft; you encounter a licence you have not seen before that is not on the SPDX list; or you suspect that AI-generated code closely matches a specific copyleft-licensed work.
The threshold for escalation is low: it is not about the situation being serious; it is about the determination being beyond your authority to make. Licence compliance decisions that carry legal consequence belong with whoever owns your company's OSS policy, not in a pull request comment thread. The cost of a five-minute conversation with legal is small. The cost of a licence conflict surfacing in an acquisition audit is not.
Frequently asked questions
Related reading
DPDP for engineers: the code changes that actually matter
Most DPDP guidance is written for compliance officers. This is the engineering version: schema migrations, consent state machines, retention jobs, and audit patterns for a defensible Indian SaaS codebase.
The self-hosted LLM cost model: what the calculators miss
The 80% savings claim for self-hosted LLMs is arithmetically correct on a fully-loaded GPU. Here is what the calculation looks like when you count correctly.
DPDP Act for engineers: what you actually have to change in your code
Most DPDP coverage is written for legal teams. This piece maps the Act's obligations to concrete engineering work: consent tables, data rights endpoints, deletion flows, and breach notification infrastructure.