Threats · Deep dive

The New AI Attack Surface: Model, RAG, Tools, Memory, Identity

Securing the model is one-fifth of the job. A component-by-component tour of where AI systems actually get attacked.

Read time: 11 min
Threat coverage: Multiple · LLM01–LLM10
Frameworks: OWASP LLM · OWASP Agentic · NIST AI RMF
Audience: Security architects

Ask most teams how they secure their AI and they'll describe what they did to the model — the guardrails, the refusals, the red-team prompts. That's one component of a system that now has five. The other four are where real incidents start.

Beyond the model

The model is the part everyone can see, so it gets the attention. But the model rarely holds the data, calls the APIs, or carries the credentials. The components wired around it do — and an attacker reasons about the whole system, not the part you hardened.

Model

The reasoning core. Attacked through jailbreaks and prompt manipulation that bypass alignment, plus extraction attempts that probe training data or system prompts. Real, but rarely the whole story — and over-indexed relative to the rest.

Retrieval (RAG)

Every document the system retrieves is untrusted input that can carry instructions. Poisoning a single record means it reaches every user who later retrieves it; retrieval manipulation and over-broad indexes leak data the user should never see. The control: treat retrieved content as data, scope the index to the user's entitlements.

Tools

The system's hands — APIs, functions, code execution. The risk is excessive agency: a tool scoped more broadly than the task turns a text manipulation into real action. The control: least-privilege tools, per-call authorization, and approval gates on anything irreversible.

Memory

Persistence makes a one-time manipulation durable. A planted instruction in long-term memory can steer decisions long after the original prompt is gone, and shared memory can bleed one user's context into another's. The control: validate what's written to memory, scope it per user, and expire it.

Identity

The credentials the system acts under — and the variable that sets your blast radius. It is almost always over-provisioned: a standing, broad token that an injected instruction inherits in full. The single highest-leverage control on this list is least-privilege, short-lived identity, because it caps the damage regardless of which other door was opened.

Fig 1 — the five-component surface: a manipulation entering at retrieval can travel to action under the system's identity.

Framework mapping

Component	Primary risk	OWASP LLM
Model	Jailbreak / extraction	LLM01 / LLM07
Retrieval	Poisoning / leakage	LLM03 / LLM06
Tools	Excessive agency	LLM06
Memory	Poisoning / persistence	LLM01
Identity	Over-privilege	LLM06

Checklist

All five components are in the threat model — not just the model.
Retrieved and tool-returned content is treated as untrusted data.
Tools are least-privilege with approval gates on irreversible actions.
Memory is validated, scoped per user, and expired.
The system's identity is least-privilege and short-lived.

Industries

The New AI Attack Surface: Model, RAG, Tools, Memory, Identity

Beyond the model

Model

Retrieval (RAG)

Tools

Memory

Identity

Framework mapping

Checklist

Put the research to work.

Keep reading.

Indirect prompt injection in tool-using agents

Securing Agentic AI: Why Autonomy Changes the Risk Model

Prompt Injection: The #1 Risk Every AI Product Team Must Understand