DPDP for engineers: the code changes that actually matter
Consent schemas, purpose limitation, retention jobs, and audit structures for Indian SaaS teams moving from policy to implementation
Most DPDP compliance discussion has happened at the policy layer: appoint a Data Protection Officer, update the privacy notice, train staff on data handling. None of that is wrong. But at some point the compliance officer walks into the engineering standup and asks whether the system is ready, and the answer is usually a long silence. This article is for engineers who have been handed that question and want to know what actually changes in the codebase. DPDP for engineers is not about policy documents — it is about schema migrations, consent flows, retention jobs, and audit structures.
What the DPDP Act actually asks of your codebase
The Digital Personal Data Protection Act, notified in August 2023 with rules finalised in 2025, creates four engineering-relevant obligations. Purpose limitation: data collected for one stated purpose cannot be used for another without fresh consent. Data minimisation: collect only what the stated purpose requires. Storage limitation: delete personal data when the purpose for which it was collected is served, or when consent is withdrawn. Accuracy and security: maintain reasonable data accuracy and protect against unauthorised access, with a 72-hour breach notification obligation to the Data Protection Board.
Each obligation maps to a specific code implication. Purpose limitation requires a consent model tied to a declared use case. Data minimisation requires a field-level audit of your schema. Storage limitation requires an automated deletion mechanism with a defined trigger. Security largely builds on your existing posture, with the addition of a documented breach response process. The first three require the most engineering work and are what this article covers.
| Obligation | What it means for your code | Rough effort |
|---|---|---|
| Purpose limitation | Track consent per purpose; block cross-purpose data use without re-consent | 1–2 weeks |
| Data minimisation | Audit every personal data field; remove or reduce fields not required for the stated purpose | 2–4 weeks (ongoing) |
| Storage limitation | Build a retention policy and an automated deletion or anonymisation job | 2–4 weeks |
| Security & accuracy | 72-hour breach notification process; periodic accuracy checks for critical fields | 1 week to document, then ongoing |
Consent state management — the first schema migration
Under DPDP, consent must be free, specific, informed, and withdrawable at any time. The current state in most SaaS products: a boolean column in the users table, stored once at signup, never tested for revocability. The minimum viable change is a dedicated consent_records table.
CREATE TABLE consent_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
principal_id UUID NOT NULL,
purpose VARCHAR(100) NOT NULL, -- 'account_management', 'marketing', 'analytics'
mechanism VARCHAR(50) NOT NULL, -- 'signup_form', 'settings_page', 'email_link'
given_at TIMESTAMPTZ NOT NULL,
withdrawn_at TIMESTAMPTZ,
privacy_version VARCHAR(20) NOT NULL, -- version of notice shown at consent time
source_ip INET
);The critical design decision: consent records are immutable. When a user withdraws consent, insert a new row with withdrawn_at set — do not update the original row. The current consent state for any (principal_id, purpose) pair is determined by the record with the latest given_at timestamp where withdrawn_at IS NULL. This preserves the full history including the exact privacy notice version shown when consent was first obtained, which is what you need if a withdrawal is later disputed.
One downstream implication: your user deletion flow changes. If you hard-delete users, you lose evidence that they consented in the first place. The standard approach is to retain consent records indefinitely, either in their original form or with principal_id replaced by a one-way hash. Consent records are the one data class where retention for dispute resolution takes precedence over minimisation.
Purpose limitation and the data inventory
DPDP requires that data collected for purpose A cannot be used for purpose B without fresh consent. This is conceptually simple and operationally difficult, because most production codebases have never mapped their data collection to stated purposes. The place to start is a personal data inventory.
Walk every table and column in your schema and tag each field by category:
- Direct identifiers: name, email address, phone number, PAN, Aadhaar number, passport number
- Indirect identifiers: IP address, device ID, browser fingerprint, precise geolocation
- Profile data: job title, organisation name, profile photo, stated preferences
- Behavioural data: page views, feature usage events, session duration, search queries
- Transactional data: purchase history, invoices, contract records
For each tagged field, record why it is collected and what it is actually used for. The two questions often have different answers. Phone numbers collected 'for two-factor authentication' but also passed to the marketing team for outreach is a purpose limitation violation under DPDP — the second use requires separate, specific consent.
The output does not have to be a formal data catalogue. A YAML or JSON file checked into source control. A static manifest mapping each table and column to its declared purpose — is sufficient. This document becomes the reference for compliance queries and the input to any future minimisation work.
Data minimisation — the schema audit you have been avoiding
Once you have the inventory, data minimisation asks a direct question for each field: 'If we removed this entirely, would the core product function?' The honest answer is yes for more fields than most teams expect.
Common findings when teams run this audit: free-text fields like job title or company name stored verbatim when a normalised enum would serve the product purpose; full phone numbers retained when only the last four digits appear in the UI; date of birth stored when only age verification is required; full postal addresses kept when only a PIN code is used for routing; analytics events carrying user IDs when the aggregate count is all that is ever queried.
Minimisation does not always mean deletion. The options available to you: hashing a field before storage so lookups work but plaintext is never held; truncating precision (city instead of GPS coordinates); replacing a stored field with a derived value computed at query time; or partitioning personal data into a separate schema with its own, shorter retention schedule.
Storage limitation — building the retention job
This is the most operationally complex DPDP obligation. The requirement is clear: delete data when the purpose is served or consent is withdrawn — but 'the purpose is served' is deliberately left for you to define. For a SaaS product, a defensible interpretation is: personal data is retained while the account is active, and for a defined period after closure or consent withdrawal. Document that period in your privacy notice before building the job.
RETENTION_POLICIES = {
"user_profile": {
"trigger": "account_closed",
"retention_days": 90,
"delete_strategy": "hard_delete",
},
"audit_logs": {
"trigger": "account_closed",
"retention_days": 365,
"delete_strategy": "anonymise", # replace principal_id with 'deleted-<hash>'
},
"consent_records": {
"trigger": "never", # retained for dispute resolution
"delete_strategy": "none",
},
"session_data": {
"trigger": "session_ended",
"retention_days": 30,
"delete_strategy": "hard_delete",
},
"marketing_analytics": {
"trigger": "consent_withdrawn", # purpose-specific trigger
"retention_days": 0,
"delete_strategy": "hard_delete",
},
}The deletion job runs daily and processes records whose trigger condition is met and whose retention window has elapsed. Hard delete cascades are the main source of complexity. Deleting a user row breaks foreign key constraints across invoices, support tickets, and audit logs. The three practical options: cascade deletion (use with care; test thoroughly on a staging copy of production data), tombstone replacement (replace the user_id foreign key with a sentinel value like 'deleted-{hash}'), or in-place anonymisation (null out identifying fields, retain the row for aggregate integrity).
One further complication: data held by processors. If you share personal data with a payment gateway, email delivery service, or analytics tool, DPDP's obligations run to those processors under the data fiduciary framework. Your deletion workflow needs either a mechanism to request deletion from processors or contractual guarantees that their retention windows match yours.
The audit log structure DPDP actually needs
DPDP creates a right of access: any data principal can ask what data you hold about them and who has accessed it. Standard application logs — request method, endpoint, status code — do not answer these questions. An audit event that does has a specific shape.
{
"event_id": "evt_01J2XYZABC",
"timestamp": "2026-05-15T09:12:33Z",
"actor_type": "user",
"actor_id": "usr_abc123",
"action": "view",
"resource_type": "invoice",
"resource_id": "inv_xyz456",
"principal_ids_accessed": ["usr_def789"],
"purpose": "self_service_billing",
"ip_hash": "sha256:8a3b..."
}The principal_ids_accessed array is the critical field. When a data principal submits an access request, you query this field to return every event that touched data about them, regardless of which actor triggered it. An administrator viewing a customer's invoice generates an event where actor_id is the admin and principal_ids_accessed includes the customer. Log the IP as a hash rather than plaintext unless you have a specific operational reason to retain the raw value; the hash is sufficient for correlating suspicious activity without holding the IP as personal data.
The purpose field is what lets you demonstrate to an auditor that each data access was for a declared and consented use. Without it, an access log is evidence of activity but not evidence of legitimacy. This is also what enables you to answer a more precise version of the data principal's access request: not just 'here is every event that touched your data' but 'here is every event, grouped by purpose'.
Where to start
If DPDP compliance is genuinely new to your codebase, the priority order below is practical rather than arbitrary. Each stage builds on the previous one, and each can be shipped incrementally without touching core product functionality.
- Consent schema. Nothing else is tractable without knowing who consented to what and when. This is a one-week sprint for most teams: schema migration, a backend endpoint to record consent events, a UI hook at the relevant action points.
- Structured audit logging. You need this to respond to access requests, and it is the kind of infrastructure that compounds over time. Implementing audit events with principal_ids_accessed for the ten most sensitive operations takes one to two weeks and is purely additive.
- Retention jobs. Write the policy document first. That document is what you show auditors, and it goes into your privacy notice. Then build the job for the highest-volume data first. Two to four weeks depending on cascade complexity.
- Data minimisation. This is ongoing, not a one-time sprint. Start with the field inventory, then run deletion candidates past engineering and legal. Some fields will be straightforward; others will surface surprising dependencies.
The DPDP rules are now in force. The question for Indian engineering teams has moved from ‘do we need to comply’ to ‘what do we build first’. Starting with consent and audit logging gives you something real to present and, more importantly, gives your team the habit of treating personal data as a first-class concern in the codebase — which is what the Act is ultimately asking for.
Frequently asked questions
Related reading
ONDC, three years in: the number that matters isn't the headline
ONDC crossed 218 million transactions in FY 2025-26. But mobility now drives over half of all orders, and retail — the segment the protocol was built to democratise — peaked in October 2024 and has been falling since.
Bootstrapped or venture-backed: the Indian SaaS calculus in 2026
India hosts the second-largest SaaS ecosystem outside the US. The raise-or-bootstrap question has a different answer in 2026 than it did in 2021. Here's the data behind the shift.
Open-source licensing for engineers: a corporate codebase guide
Legal is not reviewing every npm install — you are. Here is the practical check to run before adding a dependency, and the licence type that catches most SaaS teams off guard.