AI Memory Is an Attack Surface. Start Treating It Like One.

Microsoft just put a name to something that should have been a priority two years ago: AI recommendation poisoning. An attacker embeds hidden instructions inside content that an AI agent summarizes. The agent stores those instructions as memory. Future sessions — completely unrelated to the original document — get silently biased by that planted payload. The attack doesn't need to break anything. It uses the system exactly as designed.

The MINJA research makes the numbers impossible to ignore: 95%+ injection success rates against production agents. Not lab setups. Production. And this isn't hypothetical — it's already been observed in the wild through "Summarize with AI" features baked into enterprise tools. The same features being rolled out as productivity wins.

Here's what bothers me most: memory persistence is being treated as a UX feature, not a security boundary. Teams are celebrating that their agent "remembers context across sessions" without asking who else gets to write to that memory. The answer, apparently, is anyone who can get the agent to read their content. That's a massive, implicit trust assumption that nobody audited.

Runtime guardrails don't save you here. By the time the poisoned memory is triggered, the attacker's instruction has already been laundered through the model's own summarization step. It looks like the agent's own knowledge. The guardrail sees a confident, internally consistent recommendation — not an injection.

The fix isn't a patch. It's a rethink. Memory stores need integrity controls, provenance tracking, and sandboxing by source trust level. Agents that summarize external content should not write to the same memory space that governs recommendations. These are basic architectural decisions that nobody is making because the industry is too busy shipping.

We built agents with long-term memory before we built the security model for what long-term memory means in an adversarial world.