Won't a prompt instruction like 'never return email addresses' prevent data leakage?

Prompt instructions reduce the probability of leakage but don't prevent it. The SQL runs before the model's output filter has a chance to act, meaning the data reaches your tool result and potentially your logging pipeline regardless. A database role that can't see the email column is the only constraint that holds unconditionally.

Should I give the LLM access to a read replica to limit risk?

A read replica limits write risk but doesn't address column-level or row-level access control. The model can still return PII from a read replica if the role has SELECT on base tables. A read replica and a restricted view together are both necessary.

What about using an ORM layer as the query interface instead of raw SQL?

Routing through an ORM function rather than raw SQL is a valid architecture and reduces some of the surface area. The tradeoff is flexibility: a fixed function library limits the kinds of questions the agent can answer. For exploratory analytics use cases, an ORM layer alone is often too restrictive. For transactional use cases with predictable query shapes, it's a clean solution.

How do I handle multi-tenant setups where the agent serves multiple companies?

Postgres Row-Level Security is the right tool. Define a policy on each view that filters rows by the tenant ID stored in a session variable. Your application layer sets the variable before the LLM's connection runs any queries. The database engine then enforces the tenant boundary regardless of what SQL the model generates.

AI & LLMsMay 13, 20267 min readReviewed May 13, 2026

LLM database access: the RBAC gap most teams don't see

Most guardrail advice focuses on prompt injection. The harder problem is that your application-layer permissions stop at the ORM.

By FlowVerify Editorial Team

The setup for LLM database access feels almost too clean when you first wire it. User sends a message. Your model reads it, decides it needs data to answer, fires a query_database tool call with a SQL string, your application executes the query against a read replica, returns the result set, and the model synthesises a response. It works in your demo. You put the model on a read replica, so writes are off the table.

Then you ship it and discover that the access-control model you've been relying on for three years doesn't transfer.

What LLM database access looks like in a typical system

The basic architecture looks like this: the LLM is given a tool definition, something like query_database(sql: string) that returns rows. The model calls this tool when it decides it needs data to answer a question. Your application layer receives the tool call, runs the SQL against a read replica under the application database credentials, and returns the result.

This is a reasonable starting point. The problem isn't the structure — it's what's missing from it.

In a standard API backend, access control lives at the application layer. Your ORM returns only the columns your serialiser exposes. Your route handler checks the session before it touches the database. A sales manager calling /api/pipeline gets their team's deals; calling /api/users/emails returns a 403 because the middleware refuses before any SQL is generated.

An LLM tool call doesn't go through that middleware. It calls a function you wrote (usually a thin wrapper around a database connection) and that function executes whatever SQL the model produced. The model doesn't know about your route-level permission checks, because those checks were never designed to apply to this call path.

The RBAC blindspot: your middleware is not in the call path

Imagine a model serving an internal analytics tool. A sales manager asks: who are our Enterprise accounts that haven't logged in this month? The model generates SQL against your users table. The query is correct. It answers the question, and it also returns email addresses and phone numbers that this particular user, in your application's permission model, is not supposed to see. Your middleware would have stopped this. The tool call bypassed it.

The model isn't doing anything wrong. It pulled the most useful columns it knew about from your schema. You didn't tell it that email and phone were off-limits, and even if you had, a prompt instruction isn't a constraint — it's a request.

Four failure modes that aren't prompt injection

Prompt injection (adversarial input that hijacks the model's behaviour) gets most of the attention in guardrail discussions. It's real and worth defending against. But the failures that actually show up in production LLM-database integrations are mostly not adversarial. They happen because the model is doing exactly what it was asked, with scope that wasn't properly bounded.

Failure mode	How it surfaces	Fix
Unrestricted column access	Model returns PII columns because nothing in the call path excludes them	Grant SELECT on views that omit sensitive columns, not on base tables
Aggregation deanonymisation	Small-group aggregates make individual values reconstructable	Add minimum-group filters in views; suppress small-n groups
Write amplification	An ambiguous instruction produces DML affecting far more rows than intended	Separate write tools; route mutations through SECURITY DEFINER procedures
Schema leakage	Model queries information_schema to enumerate table structure it was not given	Revoke USAGE on information_schema from the agent role

The four failure modes, how they surface, and how to fix each

Unrestricted column access

The model doesn't know which columns are sensitive. It knows what's in your schema, which you may have provided in the system prompt or which it learned through introspection. When asked a question, it picks the columns that seem relevant. If your users table has thirty columns and five are sensitive, the model may include them because they're related to what the user asked.

The fix is not to list forbidden columns in the system prompt. The fix is a view that doesn't have those columns. If users_safe has twenty-five columns and excludes email, phone, and national_id, then any SELECT against that view simply cannot return those values. What the model puts in its query doesn't change that.

Aggregation deanonymisation

Aggregate data feels safe. Total deals by region or average contract value by tier: neither returns individual rows. But when the group is small, the aggregate is nearly as revealing as the raw data. Three people on the Singapore sales team. One earns significantly more than the others. Average salary by team is a number that, combined with one known data point, reconstructs a specific person's salary.

This is harder to enforce at the view layer than column exclusion. The most practical approach is a minimum-count filter on grouped results: any aggregate that would cover fewer than k members gets the group suppressed. In Postgres, a view can include HAVING COUNT(*) > 5 on frequently grouped columns — though it requires careful design to avoid over-restricting useful queries.

Write amplification

Archive the deals that didn't close last quarter. The model generates an UPDATE. Your tool runs it. Forty-seven rows are modified. Twelve of them were active deals that the model misclassified based on a status field it interpreted differently from your product definition.

Write tools deserve a separate, deliberately constrained design. The model should not have access to the same query_database function for reads and writes. Write operations should route through specific, named procedures that validate their own inputs and apply the mutation through a controlled path. The LLM calls archive_stale_deals(quarter, threshold_days) and the procedure decides what stale means by your rules, not the model's inference from English.

Schema leakage

Models explore. When they're not sure about your schema, whether because you didn't fully describe it in the system prompt or because a question requires a table they haven't seen, they'll query the schema catalogue to find out. This is a rational thing to do. The result is that your table names, column names, and data types become part of the model's context, which it can retain and use across the session.

Revoke USAGE on information_schema from your agent's database role. If the model can't query the schema catalogue, it can't enumerate tables it wasn't told about. This also prevents the model from discovering tables adjacent to the ones you exposed.

The fix: enforce at the database layer, not the prompt layer

The pattern that works is treating the LLM agent as an untrusted caller. Not adversarially untrusted (the model isn't trying to exfiltrate data) but in the same sense you'd treat a third-party integration: it has defined permissions, can only see what you've explicitly exposed, and its assertions about what it should be allowed to see are irrelevant.

This means two things in practice.

First: a dedicated database role for the LLM agent. Not the application's main database user. Not a clone of the application user with slightly fewer privileges. A separate role (call it llm_agent) that has CONNECT and SELECT on a specific list of objects, and nothing else. No SUPERUSER, no CREATEDB, no BYPASSRLS.

Second: grant SELECT on views, not on base tables. A view is a stable, enforceable contract about what the agent can see. If the view excludes sensitive columns, the database engine enforces that regardless of what SQL the model generates. The constraint doesn't depend on the model's cooperation.

A practical Postgres architecture

Here is a starting point. Adjust for your schema and sensitivity requirements.

agent-setup.sql

-- Dedicated role with minimal privileges
CREATE ROLE llm_agent LOGIN PASSWORD 'rotate-me';

-- A view that exposes only the columns the agent should see
CREATE VIEW users_safe AS
  SELECT
    id,
    created_at,
    plan,
    company_id,
    seats,
    status,
    last_login
  FROM users;
  -- email, phone, name, national_id are excluded

-- Grant on the view, not the base table
GRANT SELECT ON users_safe TO llm_agent;
REVOKE ALL ON users FROM llm_agent;

-- Block schema introspection
REVOKE USAGE ON SCHEMA information_schema FROM llm_agent;

-- Hard operational limits on the role
ALTER ROLE llm_agent SET statement_timeout = '5s';
ALTER ROLE llm_agent SET lock_timeout = '1s';
ALTER ROLE llm_agent SET row_security = on;

-- Multi-tenant: RLS policy keyed to a session variable
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON users
  USING (company_id = current_setting('app.company_id')::bigint);

The statement_timeout is not optional. An LLM-generated query is less likely to be well-structured than one written by an engineer who knows the index layout. Without a timeout, a poorly formed query against a large table holds a connection for as long as your database allows — and your application will wait. Five seconds is generous for most lookup queries.

The row_security = on at the role level ensures that even if a table has no RLS policy defined, the setting defaults to safe rather than permissive. Combined with explicit GRANT statements, new tables added to the database are inaccessible to the agent role by default.

For write access, the approach is different. Rather than granting UPDATE or DELETE to the llm_agent role, expose specific SECURITY DEFINER procedures that implement the write operation with their own validation logic. The LLM calls the function; the function decides what an instruction means by your rules, not the model's interpretation of an English phrase.

What to log

Standard LLM tracing captures the user's input and the model's final output. When the model is running SQL, that's not enough.

Log the SQL that was generated. Not just the prompt that triggered it — the actual SQL string the model produced and your application executed. If something goes wrong, the forensic artefact is the SQL, not the English question. Most observability platforms that support LLM tracing capture tool call arguments; make sure you're retaining them, not just the input/output pair.

Log the row count returned alongside each query. A query that returns 50,000 rows when you expected at most 50 is worth alerting on. You can enforce an upper bound in your application wrapper: truncate after k rows and log a warning.

Log user context alongside the SQL. When the LLM is surfaced to end users rather than running as an internal service, you need to associate each query with the user who triggered it. That association exists in your application's session context; make sure it flows through to your database audit log, not just your application logs.

The underlying principle

Most guardrail advice tries to solve an enforcement problem with an instruction. A prompt instruction saying 'don't return email addresses' is asking the model to want to respect a constraint. A database role that doesn't have access to the email column is a constraint the database engine enforces unconditionally, regardless of what the model generates.

The same question applies to every capability you give an LLM. File system access, HTTP tool calls, queue writes: the right question in each case is whether the boundary exists at the enforcement layer or only in the prompt. For a database, the enforcement layer is the database engine. It was designed for exactly this. Use it.

Frequently asked questions

Every Postgres isolation level, and the specific bug it lets through

Three isolation levels, three distinct failure modes. Most Postgres deployments run at Read Committed without knowing it. Here is what each level permits and what upgrading actually costs.

May 15, 2026Read full article →

AI & LLMsMay 13, 20267 min readReviewed May 13, 2026

LLM database access: the RBAC gap most teams don't see

Most guardrail advice focuses on prompt injection. The harder problem is that your application-layer permissions stop at the ORM.

By FlowVerify Editorial Team

Then you ship it and discover that the access-control model you've been relying on for three years doesn't transfer.

What LLM database access looks like in a typical system

This is a reasonable starting point. The problem isn't the structure — it's what's missing from it.

The RBAC blindspot: your middleware is not in the call path

Four failure modes that aren't prompt injection

Failure mode	How it surfaces	Fix
Unrestricted column access	Model returns PII columns because nothing in the call path excludes them	Grant SELECT on views that omit sensitive columns, not on base tables
Aggregation deanonymisation	Small-group aggregates make individual values reconstructable	Add minimum-group filters in views; suppress small-n groups
Write amplification	An ambiguous instruction produces DML affecting far more rows than intended	Separate write tools; route mutations through SECURITY DEFINER procedures
Schema leakage	Model queries information_schema to enumerate table structure it was not given	Revoke USAGE on information_schema from the agent role

The four failure modes, how they surface, and how to fix each

Unrestricted column access

Aggregation deanonymisation

Write amplification

Schema leakage

The fix: enforce at the database layer, not the prompt layer

This means two things in practice.

A practical Postgres architecture

Here is a starting point. Adjust for your schema and sensitivity requirements.

agent-setup.sql

-- Dedicated role with minimal privileges
CREATE ROLE llm_agent LOGIN PASSWORD 'rotate-me';

-- A view that exposes only the columns the agent should see
CREATE VIEW users_safe AS
  SELECT
    id,
    created_at,
    plan,
    company_id,
    seats,
    status,
    last_login
  FROM users;
  -- email, phone, name, national_id are excluded

-- Grant on the view, not the base table
GRANT SELECT ON users_safe TO llm_agent;
REVOKE ALL ON users FROM llm_agent;

-- Block schema introspection
REVOKE USAGE ON SCHEMA information_schema FROM llm_agent;

-- Hard operational limits on the role
ALTER ROLE llm_agent SET statement_timeout = '5s';
ALTER ROLE llm_agent SET lock_timeout = '1s';
ALTER ROLE llm_agent SET row_security = on;

-- Multi-tenant: RLS policy keyed to a session variable
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON users
  USING (company_id = current_setting('app.company_id')::bigint);

What to log

Standard LLM tracing captures the user's input and the model's final output. When the model is running SQL, that's not enough.

LLM database access: the RBAC gap most teams don't see

What LLM database access looks like in a typical system

The RBAC blindspot: your middleware is not in the call path

Four failure modes that aren't prompt injection

Unrestricted column access

Aggregation deanonymisation

Write amplification

Schema leakage

The fix: enforce at the database layer, not the prompt layer

A practical Postgres architecture

What to log

The underlying principle

Frequently asked questions

Related reading

Every Postgres isolation level, and the specific bug it lets through

The AI wrapper debate, three years in: what the survivors built

The minimum viable security posture for a 10-person SaaS

Stay ahead on eSignatures, compliance, and document workflows

Every Postgres isolation level, and the specific bug it lets through

LLM database access: the RBAC gap most teams don't see

What LLM database access looks like in a typical system

The RBAC blindspot: your middleware is not in the call path

Four failure modes that aren't prompt injection

Unrestricted column access

Aggregation deanonymisation

Write amplification

Schema leakage

The fix: enforce at the database layer, not the prompt layer

A practical Postgres architecture

What to log

The underlying principle

Frequently asked questions

Related reading

Every Postgres isolation level, and the specific bug it lets through

The AI wrapper debate, three years in: what the survivors built

The minimum viable security posture for a 10-person SaaS

Stay ahead on eSignatures, compliance, and document workflows

Every Postgres isolation level, and the specific bug it lets through