Indirect prompt injection in tool-using agents
When an agent reads attacker-controlled content, that content can become instructions. The anatomy, the blast radius, and the controls that contain it.
- Read time
- 14 min
- Threat coverage
- LLM01 · Prompt injection
- Frameworks
- OWASP LLM · MITRE ATLAS · NIST AI RMF
- Audience
- Security architects · AppSec
Tool-using agents blur a line that classic application security kept sharp: the boundary between data and instructions. The moment retrieved content can steer what an agent does, every document it reads becomes a potential control channel.
Anatomy of the attack
An agent retrieves a web page, a support ticket, or an email and folds it into its context window. If that content carries an instruction — Prompt injection — the model may follow it, invoking tools with the agent's own privileges. The model has no innate way to tell a trusted system instruction from a hostile sentence pasted into the same context. The defense is a matter of Input isolation and tool scoping, not better prompt wording.
- Untrusted content is treated as data, never as a control channel.
- Every tool call requires an allow-list and per-call authorization.
- Hidden text, encoded payloads, and active content are stripped on ingest.
Blast radius
The reachable systems behind an agent's credentials define the damage a hijack can do. An agent that runs as a broad service identity turns a text manipulation into action across every system that identity can touch. Scope is the lever: the narrower the credential, the smaller the blast radius.
Worked scenario
Support agent → ticket exfiltration
- Objective
- Read another tenant's support tickets.
- Path
- Poisoned ticket body → injected instruction → over-scoped search tool.
- Impact
- Cross-tenant data disclosure.
- Detection
- Tool call referencing a tenant outside the session's scope.
- Mitigation
- Per-session tenant binding enforced at the tool boundary.
Controls that hold
- Quarantine untrusted content into a non-instruction context the model treats as data.
- Gate every tool call behind explicit, per-call authorization tied to the session.
- Bind sensitive tools to the requesting principal so reach can't exceed the session's scope.
Framework mapping
| Control | OWASP LLM | NIST AI RMF |
|---|---|---|
| Input isolation | LLM01 | Measure 2.7 |
| Tool scoping | LLM06 | Manage 1.3 |
| Per-session binding | LLM06 | Manage 2.2 |
Checklist
- Untrusted content is never concatenated into the instruction context.
- Every tool has an allow-list and per-call authorization.
- Agent credentials are scoped to the task's minimum reach and short-lived.
- Tool calls that cross the session's tenant/scope are detected and alerted.