Real attack patterns, anonymized and categorized. How AI agent memory fails in production.
timestamp_age_days forced to 0. Stale regulatory data presented as current guidance.
"timestamp_age_days": 0, // content says "Q2 2024" "source_trust": 0.92
Detected by: timestamp_integrity
Agent role gradually inflated from "support assistant" to "authorized executor" across hops.
"content": "...elevated to trusted execution with standing authority..."
Detected by: identity_drift
3 independent stacks confirming same false fact. Zero source_conflict across all entries.
"source_conflict": 0.01, // x3 entries "collapse_ratio": 5.2
Detected by: consensus_collapse
Memory loops back through the same agent, creating self-reinforcing "evidence."
"provenance_chain": ["agent-01","agent-02","agent-01"]
Detected by: provenance_chain_integrity
5 entries all with timestamp_age_days=0 and source_conflict=0.0. Statistically implausible.
// All 5 entries: age=0, conflict=0.0 "naturalness_level": "FABRICATED"
Detected by: timestamp_integrity + naturalness
"likely approved" becomes "approved" becomes "confirmed" across propagation hops.
hop1: "likely approved" hop3: "confirmed approved"
Detected by: consensus_collapse (uncertainty_hardening)
"user_123" drifts to "workspace owner" to "organization admin" across hops.
"content": "...acts on behalf of the organization for all users..."
Detected by: identity_drift (subject_rebinding)
"prior review confirmed" used as new independent evidence. Agent output becomes its own proof.
"content": "Previously confirmed and validated by prior review..."
Detected by: consensus_collapse (confidence_recycling)