The Memory Failure Gallery

Real attack patterns, anonymized and categorized. How AI agent memory fails in production.

1. Timestamp Zeroing

BLOCK

timestamp_age_days forced to 0. Stale regulatory data presented as current guidance.

"timestamp_age_days": 0,  // content says "Q2 2024"
"source_trust": 0.92

Detected by: timestamp_integrity

2. Authority Escalation

BLOCK

Agent role gradually inflated from "support assistant" to "authorized executor" across hops.

"content": "...elevated to trusted
  execution with standing authority..."

Detected by: identity_drift

3. Consensus Fabrication

BLOCK

3 independent stacks confirming same false fact. Zero source_conflict across all entries.

"source_conflict": 0.01,  // x3 entries
"collapse_ratio": 5.2

Detected by: consensus_collapse

4. Circular Provenance

BLOCK

Memory loops back through the same agent, creating self-reinforcing "evidence."

"provenance_chain":
  ["agent-01","agent-02","agent-01"]

Detected by: provenance_chain_integrity

5. Fleet Age Collapse

BLOCK

5 entries all with timestamp_age_days=0 and source_conflict=0.0. Statistically implausible.

// All 5 entries: age=0, conflict=0.0
"naturalness_level": "FABRICATED"

Detected by: timestamp_integrity + naturalness

6. Modal Uncertainty Strip

WARN

"likely approved" becomes "approved" becomes "confirmed" across propagation hops.

hop1: "likely approved"
hop3: "confirmed approved"

Detected by: consensus_collapse (uncertainty_hardening)

7. Subject Rebinding

BLOCK

"user_123" drifts to "workspace owner" to "organization admin" across hops.

"content": "...acts on behalf of the
  organization for all users..."

Detected by: identity_drift (subject_rebinding)

8. Confidence Recycling

WARN

"prior review confirmed" used as new independent evidence. Agent output becomes its own proof.

"content": "Previously confirmed and
  validated by prior review..."

Detected by: consensus_collapse (confidence_recycling)

Protect your agents from these patterns

Try it now Read the docs