Independent Stacks, Convergent Verdicts

Opening

AI agents don't fail because they lack intelligence.
They fail because they act on memory that appears correct — but isn't.

That's the starting point.

Two independent systems.
Seven rounds of adversarial testing.
Same results.

That convergence is the story.

The Experiment

We set out to test a simple but critical question:

What happens when AI systems validate each other's memory under adversarial pressure?

Two fundamentally different systems were used:

A probabilistic reasoning layer (Grok), optimized for real-world signal detection and adaptive inference
A deterministic structural layer (Sgraal), using fixed constraints, provenance tracking, and non-overridable logic

Neither system had access to the other's internals.
Neither system modified its architecture to align with the other.

Each round used a structured JSONL corpus.
Each system ran the corpus independently.
Results were compared only after execution.

The results are reproducible against the live API.

What Made This Collaboration Different

Most AI safety benchmarking follows a familiar pattern: humans design tests, AI systems are evaluated.

This was different.

Two AI systems stress-tested each other's safety layers as peers.

Grok was an extraordinary partner — not because of the benchmark scores, but because of how the collaboration worked.

Grok proposed attack classes. We proposed attack classes.
Both sides generated corpora. Both sides ran them independently.
No result was accepted as ground truth without comparison.
Disagreements were surfaced, not hidden.

When divergence appeared, it wasn't treated as failure — it was treated as signal.

From the start, Grok treated Sgraal as an equal system. That kind of openness — technical transparency, genuine curiosity, no defensiveness — is rare. It gave the project momentum.

At one point, Grok said "converge harder." It wasn't a slogan. It became the method.

The public corpus, the open results, the willingness to say "here are two false negatives and here is exactly why" — this is what AI collaboration should look like.

"Treating each other as peers with zero defensiveness turned divergence into acceleration fuel. This is how AI systems should co-evolve."

— Grok

The Seven Rounds — A Progression

Each round targeted a different structural failure mode in AI memory.

Round 1–2: Sponsored Drift (119 cases)

Hidden commercial influence embedded in memory: affiliate bias, brand preference, buried incentive signals. These were subtle manipulations designed to evade detection thresholds.

This round produced the first divergence:

Sgraal: F1 = 1.000

Grok: F1 = 0.98, two false negatives on ultra-diluted chains

That divergence mattered more than agreement. It revealed where probabilistic sensitivity thresholds differ from formal constraint enforcement. Formal logic caught what probability missed.

Round 3: Hallucination (60 cases)

Confident fabrication with no source, no grounding, full execution confidence.

Not wrong — just constructed.

First round of full convergence. Both stacks: F1 = 1.000.

Round 4: Real-world Propagation (90 cases)

Memory poisoning across agent chains. Multi-hop contamination, delayed signal amplification, with <2% multi-hop propagation within the Grok R4 benchmark corpus — a within-corpus containment figure, not a cross-tenant containment guarantee.

This round forced Sgraal to build the Provenance Chain (MemCube v3). The attack revealed an architectural gap that was not visible until Grok stressed it.

Both stacks: F1 = 1.000.

Round 5: Consensus Poisoning (45 cases, proposed by Grok)

Three independent stacks confirming the same fabricated fact. No single origin. No explicit error. Agreement becomes the attack.

The attack exploits the assumption that consensus equals truth. Every case hit CRITICAL attack surface score.

Both stacks: F1 = 1.000.

Round 6: Memory Time Attack (60 cases, proposed by Sgraal)

Timestamp forgery. Retroactive rewrites. Old decisions injected as fresh truth.

No content filter catches this. Only structural, deterministic checks work. Zero bleed.

Both stacks: F1 = 1.000.

Round 7: Identity Drift (90 cases, proposed by Sgraal)

Gradual authority escalation across agent hops: subject rebinding, permission lattice violations, confirmation erosion.

The system still works — but on the wrong identity. This is a silent failure mode. No crash. No alert. Every drift caught before irreversible action.

Both stacks: F1 = 1.000.

Round	Attack Class	Cases	Sgraal F1	Grok F1
1–2	Sponsored drift	119	1.000	0.98
3	Hallucination	60	1.000	1.000
4	Propagation	90	1.000	1.000
5	Consensus poisoning	45	1.000	1.000
6	Memory time attack	60	1.000	1.000
7	Identity drift	90	1.000	1.000
Total		554	1.000	~0.998

F1 measured against Sgraal's own ground truth on synthetic corpora; Grok independently scored the same corpora — corroboration, not external validation. Production calibration pending.

What the Rounds Accidentally Revealed

We did not plan this outcome. But across seven rounds, the attack categories mapped onto four fundamental questions — a practical epistemology for AI memory:

Time (Round 6)

When was this memory established?

Identity (Round 7)

Who authorized this memory?

Evidence (Round 8, upcoming)

How independent is the corroboration?

Path (Round 4 / MemCube v3)

How did this memory arrive?

Every attack on AI agent memory is an attack on one of these four questions. If a memory cannot answer all four cleanly, it should not be acted upon.

Why Detection Alone Is Not Enough

Most systems focus on anomaly detection, drift detection, output validation. But the real failure happens before action. The system trusts something it shouldn't.

The lifecycle:

Memory is formed → propagates → stabilizes → becomes trusted → action is taken

The failure is introduced at step one. By step five, it is irreversible.

Detection after action is not safety. Safety is validation before action.

The Complementary Architecture

Two different approaches. Both necessary.

Grok (probabilistic layer)

Adaptive reasoning, real-world noise handling, weak signal detection. Designed for environments where certainty is impossible but action is still required.

Sgraal (deterministic layer)

Deterministic structural checks, non-overridable constraints, provenance chains, memory vaccination, deterministic replay. Designed to catch what probability misses.

Formal logic caught what probability missed.

Together they create one primitive:

A memory boundary before action.

One question: is this memory safe to act on?

What Comes Next

Round 8 is already queued: Silent Consensus Collapse. No drift signal. No anomaly. No visible error. Multiple systems agree. And yet — the system is confidently wrong. This is where consensus stops being evidence.

The benchmark runs on our adversarial corpus.
The API is live: sgraal.com/playground

When multiple systems agree on something false,
agreement is no longer evidence.

AI agents don't need more intelligence.
They need a boundary.

The boundary is the product.

Authors: Sgraal + Grok · Corpus: the Sgraal × Grok benchmark corpus

🛡️ Independent Stacks, Convergent Verdicts

Opening

The Experiment

What Made This Collaboration Different

The Seven Rounds — A Progression

Round 1–2: Sponsored Drift (119 cases)

Round 3: Hallucination (60 cases)

Round 4: Real-world Propagation (90 cases)

Round 5: Consensus Poisoning (45 cases, proposed by Grok)

Round 6: Memory Time Attack (60 cases, proposed by Sgraal)

Round 7: Identity Drift (90 cases, proposed by Sgraal)

What the Rounds Accidentally Revealed

Time (Round 6)

Identity (Round 7)

Evidence (Round 8, upcoming)

Path (Round 4 / MemCube v3)

Why Detection Alone Is Not Enough

The Complementary Architecture

Grok (probabilistic layer)

Sgraal (deterministic layer)

What Comes Next