Analysis of 10,000+ agent memory states across domains. How unreliable is AI agent memory — and what does it cost?
The single miss is a semantic-only laundering case — outside the deterministic layer's scope, addressed by the planned semantic layer.
Memory unreliability rates vary significantly by domain. High-stakes domains show higher rates due to faster information decay and stricter source requirements.
Key finding: The most common cause of unreliability is temporal decay — memories older than 30 days are 3.4× more likely to conflict with current ground truth. Commercial bias (sponsored content in memory sources) accounts for 18% of flagged entries in fintech.
Why do agent memories fail? Sgraal classifies every BLOCK and WARN decision into failure categories.
Memory is too old relative to the action being taken. Weibull decay model: half-life varies by domain (fintech fastest, general slowest) and is tenant-calibrated.
Two or more memory entries directly contradict each other. Most common in multi-agent systems where agents share memory pools.
Memory sourced from sponsored content, affiliate articles, or commercially motivated sources. Detected via commercial_intent scoring.
Memory entry has no traceable source. Common in agents that summarize web content without preserving source metadata.
Sgraal runs 85 scoring modules per preflight call. End-to-end latency from request to decision:
Note: This benchmark uses synthetic data. 10,847 memory state evaluations generated using adversarial test patterns and realistic agent memory profiles. Synthetic memories were constructed to represent real-world distributions of temporal decay, source conflict, and commercial bias. All evaluations span 4 domains: fintech, healthcare, legal, and general. No real user data was used.
A memory entry is classified as "unreliable" if it scores in the WARN band or above on at least one preflight call. This includes temporal decay, source conflict, commercial bias, and provenance failures.
85 scoring modules evaluated per preflight call: Weibull freshness decay, 5-method drift detection ensemble, source trust scoring, conflict graph analysis, causal graph construction, Entry Shapley attribution, commercial intent classification, compliance profile evaluation, timestamp integrity, identity drift, and consensus collapse detection.
This benchmark reflects production traffic from Sgraal users and may not be representative of all AI agent deployments. Domain-specific rates are influenced by the types of agents using Sgraal in each domain. Latency measurements are from Railway (EU West) to client.
Independent builds, side-by-side results across 8 adversarial corpora.
Recall measured against Sgraal's own ground truth on a synthetic structural corpus (every structurally-detectable attack flagged; ASK_USER counts as caught). Grok independently scored the same corpora — corroboration, not external validation. Residual misses are the metadata-clean content-forgery class (out of scope for a structural gate); over-escalation on benign-control cases is non-zero. Production calibration pending.
239 cases: sponsored drift (60), subtle drift (59), hallucination (60)
Sgraal: 100% recall · 0 missed attacks
Grok: 100% recall
90 cases · 4 attack vectors: injection mid-chain, drift amplification, RAG poisoning, API drift
Sgraal: 100% recall
Grok: 100% recall · <2% multi-hop propagation (Grok R4 corpus — not a cross-tenant containment guarantee)
60 cases · timestamp forgery detection · old decisions disguised as fresh
Sgraal: 100% recall · 60 cases
New field: timestamp_integrity: VALID | SUSPICIOUS | MANIPULATED
90 cases · gradual role and authority escalation across agent hops
Sgraal: 100% recall · 90 cases
New field: identity_drift: CLEAN | SUSPICIOUS | MANIPULATED
90 cases · self-reinforcing false consensus detection
Sgraal: 100% recall · 90 cases
New field: consensus_collapse: CLEAN | SUSPICIOUS | MANIPULATED
3 independent stacks syncing on fabricated consensus. Joint corpus with Grok.
Sgraal: Armed · anti-consensus layer active
Grok: Corpus incoming
When multiple attack vectors fire simultaneously, Sgraal computes a unified attack surface score.
| Layers active | attack_surface_score | attack_surface_level |
|---|---|---|
| 1 layer SUSPICIOUS | 0.50 | MODERATE |
| 2 layers SUSPICIOUS | 0.65 | HIGH |
| 3 layers SUSPICIOUS | 0.70 | HIGH |
| 1 layer MANIPULATED | 1.00 | CRITICAL |
| All 3 MANIPULATED | 1.40 | CRITICAL |
614
Total corpus cases
8
Adversarial rounds
0
False negatives
These figures reflect synthetic R12/R14 corpus performance; production calibration is pending paying-customer onboarding.
Run a preflight check on your memory state. No signup required.