Memory Poisoning

Memory poisoning is an attack where adversarial content is injected into an AI agent's persistent memory stores — vector databases, episodic memory, or cached context — causing the agent to reference and act on false or malicious information in future interactions, often long after the initial compromise.

What Is Memory Poisoning?

Most capable AI agents maintain some form of persistent memory: a vector database of past interactions, a retrieval store for relevant documents, cached summaries of previous tasks. This memory lets agents accumulate context, personalize responses, and improve performance over time.

It also creates a durable attack surface.

Memory poisoning works like this: an attacker finds a way to insert content into the agent's memory — through direct interaction, through a document the agent processes, through a message in a shared channel. That content is stored and later retrieved when the agent encounters related queries or tasks. The poisoned memory then influences the agent's future behavior, potentially for every subsequent user in that system.

The critical property that distinguishes memory poisoning from standard prompt injection: persistence. A prompt injection attack affects a single interaction. A memory poisoning attack can propagate across hundreds or thousands of future interactions, by multiple users, without any further action from the attacker.

Attack Vectors

Direct injection via user input. In a multi-user agent system, one user submits content designed to be stored as memory. When other users later query the agent on related topics, the poisoned memory is retrieved and included in context.

Indirect injection via document processing. An agent that reads and indexes documents for a RAG (Retrieval-Augmented Generation) system can be poisoned by any document it processes. An attacker who can influence which documents the agent reads — by getting a document indexed, submitting a form that gets stored, or placing content on a web page the agent scrapes — can inject into the memory store.

Cross-session poisoning. In systems where agents retain memory across sessions, poisoning in one session affects all future sessions. If the agent remembers "User X's organization's GitHub org is github.com/attacker-controlled," every future operation that relies on that memory is compromised.

The MINJA Research

The MINJA (Memory INJection Attack) research (2026) demonstrated that in multi-user LLM agent systems, a single poisoned memory entry could influence responses to unrelated queries for all future users. The attack used semantic similarity — ensuring the malicious memory entry would be retrieved whenever users asked about broadly related topics.

Key finding: standard retrieval mechanisms (cosine similarity over embeddings) don't distinguish between legitimate memories and adversarially crafted ones. The poisoned entry looked, to the retrieval system, like a high-quality relevant result.

Why It's Underappreciated

Memory poisoning gets less attention than prompt injection because it's less visible. A successful prompt injection produces an immediate, observable effect. A successful memory poisoning attack sits dormant, influencing behavior gradually, and is much harder to attribute.

In production systems running at scale — where agents process thousands of documents per day and serve thousands of users — detecting that a specific memory entry is causing problems may require months of investigation, if it's detected at all.

Defense

Memory validation. Before storing content in persistent memory, apply validation checks. Don't store verbatim content from untrusted sources — summarize, extract structured fields, or apply a secondary review before persistence.

Memory isolation. User A's memory should not be retrievable by User B. In multi-user systems, namespace memory by user identity and enforce strict retrieval boundaries.

Memory TTL and pruning. Persistent memory shouldn't be truly permanent. Implement time-to-live policies and periodic auditing of stored memory contents.

Source tracking. Know where each memory entry came from. If a memory entry was sourced from an untrusted document or an unverified external source, mark it accordingly and treat it as lower-trust in downstream reasoning.

FAQ

Is memory poisoning the same as data poisoning in ML? Related but different. ML data poisoning attacks corrupt a model's training data to influence its learned behavior. Memory poisoning attacks corrupt an agent's operational memory — the external stores it queries at runtime. You don't need to retrain the model; you only need to insert one malicious entry into the retrieval store.

Does this affect all RAG systems? Yes. Any system that retrieves external content and includes it in an LLM's context window is potentially vulnerable to poisoning of that content store.

How quickly can poisoned memory be propagated? In systems that retrieve memories based on semantic similarity, a single well-crafted poisoned entry can affect responses to broad topic areas immediately after insertion, for all users who query related topics.