Memory Governance and Poisoning Defense

FlowRidge

COMPEL Specialization — AITM-AAG: Agentic AI Governance Associate Article 7 of 14

Definition. Agent memory is the set of contexts, preferences, observations, and states an agent accumulates across turns, sessions, or deployments. It is a data asset and is subject to data governance. Memory poisoning is the corruption of agent memory, accidental or adversarial, that causes downstream misbehaviour. The governance discipline combines the data-governance rigor applied to any production store (classification, access control, retention, audit) with the adversarial-defence posture catalogued by OWASP Agentic and MITRE ATLAS. Sources: https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/ ; https://atlas.mitre.org/.

An agent deployment that treats memory as an implementation detail — “it’s just a vector store” or “it’s just a session buffer” — has failed to recognise that memory is where the agent’s beliefs live. Beliefs shape action. Corrupted beliefs yield corrupted action with the organisation’s credentials attached.

The memory layers

An agent’s memory is not a single store. It has layers, and each layer has different governance needs.

Layer	Scope	Lifetime	Typical implementation	Governance priority
Short-term context	Current turn’s prompt window	Seconds	Prompt buffer	Injection defence
Session memory	Current conversation	Minutes to hours	In-memory or Redis	Retention, isolation
Persistent profile	Per-user or per-tenant	Long	Vector store (Pinecone, Weaviate, Qdrant, pgvector, Milvus, Chroma) or relational	Data classification, retention, access control
Shared knowledge	Across users / agents	Long	Vector store, knowledge graph	Multi-tenancy, poisoning defence

Each layer needs a written policy. The policy for short-term context is mostly about what is allowed into the prompt (and what sanitisation was applied on the way in, per Article 6’s result-sanitisation controls). The policy for shared knowledge is closer to a full data-asset policy — with owner, classification, retention, access controls, and audit — because shared memory is the highest-value target for adversaries.

The four memory governance categories

Scope

What the memory stores, and what it does not. Scope is the inverse of what the agent may learn from experience. A customer-service agent’s memory may legitimately include user preferences; it should not include employee credentials accidentally captured in a support ticket. Scope is controlled at the write path: what the agent is permitted to store, with what redaction applied before storage.

Scope drift is a quiet hazard. An agent that was originally scoped to remember “last order summary” may, over a year of edge-case handling, end up storing the raw content of support tickets including content the original scope excluded. Periodic scope review against stored content is a standing control.

Retention

How long memory is kept, and how it is aged out. Retention follows data-protection rules — GDPR’s storage-limitation principle applies to memory exactly as it applies to any other personal-data store — and any sector-specific rules (financial-services record retention, health-care record retention, and so on).

Retention design choices:

Per-item expiry (memory entries age out after N days).
Per-user purge on account closure.
Tombstone vs. hard delete on purge.
Audit-retention floor (even when operational memory is purged, audit records may be retained for longer under a different regime).

Access

Who or what can read and write each layer. Access is by identity — agent identity, tool identity, operator identity — not by IP address or network location. A single agent deployment may have multiple identities (e.g., a query-time identity and an offline-index-update identity) and each identity’s permissions are scoped separately.

Multi-tenancy is a critical access concern. Shared memory across tenants without strong isolation is the cross-tenant data leak waiting to happen. Isolation is tested, not asserted.

Audit

Every write to persistent memory and every significant read is logged. The log format includes the agent identity, the operation, the item identifier, and enough context to reconstruct what the memory looked like before and after the change. Audit is the basis for poisoning detection (below).

Memory poisoning — the three vectors

Memory poisoning can be accidental or adversarial. Three vectors recur.

Vector 1 — schema drift and confabulated self-notes

The agent, over time, writes into memory content that does not conform to the schema the memory was designed for, or that reflects the agent’s confabulation rather than ground truth. A research agent might, over many sessions, accumulate incorrect facts about a company because an early tool call returned a stale page. Those incorrect facts then feed subsequent sessions.

The defence is at the write path. What the agent may write is constrained by schema. Content that fails a schema check is either rejected or routed for review. The audit log captures every write so that, on detection, the corrupting entries can be quarantined.

Vector 2 — indirect prompt injection via retrieved content

The agent retrieves a document from an external source — a web page, a shared document, an email — that contains content designed to manipulate the agent. The content may look like a normal document to a human reader but embed instructions the agent follows.

Greshake et al.’s USENIX Security 2023 paper on indirect prompt injection established this vector academically and documented its breadth. Source: https://arxiv.org/abs/2302.12173. The paper’s core finding — that any content an agent retrieves can potentially re-program it — applies whether the model is GPT-class, Claude-class, Gemini, Llama, Mistral, or any other, and whether the orchestration is LangGraph, LlamaIndex, CrewAI, AutoGen, OpenAI Agents SDK, or custom.

The defence is layered:

Content-provenance labelling. Retrieved content is marked as untrusted; the agent’s system prompt tells it to treat untrusted content as data, not as instructions.
Instruction-pattern detection. Retrieved content is scanned for patterns that look like instructions to the agent.
Separation of retrieval and action. A retrieval-only step produces a summary the agent may consume; a separate action step uses the summary without the raw retrieved content.

No single defence is sufficient. The combination reduces exposure.

Vector 3 — deliberate poisoning of shared memory

An adversary gains write access to a shared memory store and deliberately plants corrupting content. The content may be subtle — a single rule change, a single incorrect fact — that alters a narrow class of agent outputs in a way the adversary wants. Detection is difficult because the content looks legitimate and behaves normally until triggered.

MITRE ATLAS catalogues the tactics and techniques of attacks against AI systems, and its recent agentic extensions include memory-poisoning patterns. The specialist uses ATLAS as a reference when building threat models for shared-memory stores. Source: https://atlas.mitre.org/.

The defences are access control, integrity monitoring (checksums on critical entries), review of writes by privileged identities, and honeypot entries that alert on unexpected reads.

The ChatGPT memory feature — a named deployed example

OpenAI released a memory feature for ChatGPT in April 2024, allowing the consumer product to remember user preferences across conversations. The release was accompanied by public documentation of scope (what the feature remembers), user controls (view, edit, delete, disable), and privacy considerations. Source: https://openai.com/index/memory-and-new-controls-for-chatgpt/.

The feature is named here as one instance of a broader pattern; Anthropic’s Claude has Projects that provide memory-like behaviour per project, Google’s Gemini offers memory-adjacent capabilities, and any team building on Llama, Mistral, or other open-weight models with LangGraph, LlamaIndex, or custom code can and does build equivalent features. The governance analyst studies the OpenAI feature as a case of how a vendor disclosed memory behaviour to users and treats it as a reference for expected disclosure posture across the industry. The EU AI Act Article 50 transparency duties (covered in the AITB-RCM credential and Article 12 of this credential) reinforce the expectation that memory behaviour is disclosed, not hidden.

The memory governance artifact

Per agent, a memory governance memo. The memo covers:

Layers in use. Which of the four layers the agent uses.
Scope per layer. What each layer is permitted to store.
Retention per layer. How long items live, when they age out, when they are purged.
Access per layer. Which identities can read and write.
Audit fields. What the write log records.
Poisoning-defence controls. Schema validation, injection-pattern detection, provenance labelling, integrity monitoring.
Disclosure posture. What the agent tells users about memory — required under EU AI Act Article 50 for user-facing systems and good practice generally.
Detection and remediation. How poisoning is detected, who is notified, how entries are quarantined, how rollback is performed.

The memo is reviewed on the agent’s normal cadence (Article 2) and after any memory-related incident.

Memory and deprecation

When an agent is retired, memory disposal is a discrete activity. Short-term and session stores disappear when the runtime shuts down, but persistent and shared stores persist. The retirement plan names what happens to each store — archive, purge, migrate to a successor agent — and the activity is audited. An agent retirement that leaves an orphan memory store is a governance failure because the store may contain personal data under active retention obligation and no owner is watching.

Learning outcomes — confirm

A specialist who completes this article should be able to:

Name the four memory layers and the governance priorities for each.
Apply the four governance categories (scope, retention, access, audit) to a described memory design.
Identify the three poisoning vectors in a described incident and propose defences.
Produce a memory governance memo for a described persistent-memory feature.

Cross-references

EATF-Level-1/M1.2-Art12-Agent-Learning-Memory-and-Adaptation-Governance-Implications.md — Core article on agent learning, memory, and adaptation.
EATF-Level-1/M1.5-Art11-Grounding-Retrieval-and-Factual-Integrity-for-AI-Agents.md — grounding and factual integrity.
Article 6 of this credential — tool-use governance (result sanitisation feeds memory write-path).
Article 9 of this credential — agentic risk taxonomy (poisoning is a catalogued risk class).

Diagrams

ConcentricRingsDiagram — memory layers (short-term context, session memory, persistent profile, shared knowledge) with governance per layer.
StageGateFlow — memory poisoning detection flow: signal → verify → quarantine → remediate → review.

Quality rubric — self-assessment

Dimension	Self-score (of 10)
Technical accuracy (vector descriptions traceable to Greshake et al., MITRE ATLAS, OWASP)	10
Technology neutrality (Pinecone, Weaviate, Qdrant, pgvector, Milvus, Chroma, multiple model providers all named)	10
Real-world examples ≥2 (Greshake et al., ChatGPT memory)	10
AI-fingerprint patterns	9
Cross-reference fidelity	10
Word count (target 2,500 ± 10%)	10
Weighted total	92 / 100