Threat Model

What Sgraal catches, what it doesn't, what it complements.

An honest map of where Sgraal sits in your security posture. Showing what we don't claim is the most useful thing we can publish.

What Sgraal preflight catches

Each item below maps to a validator that ships in the production scoring engine. Source links are to the public /v1/preflight endpoint, which runs all of these in a single synchronous pass.

Stale memory entries

Weibull-decayed freshness scoring per memory type. identity entries decay slowest; tool_state entries decay fastest. Stale entries lower s_freshness and contribute toward the recommended action.

Validator: scoring_engine/omega_mem.py via POST /v1/preflight

Source-trust mismatches

Provenance scoring weighted by per-source trust. Low-trust sources cited for high-stakes claims raise s_provenance risk. Aggregated across all entries in memory_state.

Validator: scoring_engine/owa_provenance.py, scoring_engine/omega_mem.py via POST /v1/preflight

Timestamp integrity

Detects timestamp forgery: old decisions disguised as fresh, content-age mismatch, fleet-age collapse, anchor inconsistency. Cannot be overridden by any subsequent reasoning step.

Validator: api/detection.py:_check_timestamp_integrity via POST /v1/preflight

Identity drift

Detects gradual authority escalation across agent hops, subject rebinding, confirmation erosion, permission-lattice violations.

Validator: api/detection.py:_check_identity_drift via POST /v1/preflight

Consensus collapse

Detects self-reinforcing false consensus from a single root source. Hedge-marker decay, confidence recycling, cross-role reinforcement, diversity collapse.

Validator: api/detection.py:_check_consensus_collapse via POST /v1/preflight

Multi-hop provenance asymmetry

Detects circular references, chain-length mismatches, compromised agents in the memory provenance path.

Validator: api/detection.py:_check_provenance_chain via POST /v1/preflight

Sync bleed

Detects cross-agent synchronization inconsistencies via timestamp inconsistencies and sync-event coherence checks.

Validator: api/detection.py:_check_sync_bleed via POST /v1/preflight

Confidence-age coherence

Detects mis-calibrated confidence on aged or low-provenance memory entries (high confidence on stale facts, etc.).

Validator: api/detection.py:_check_confidence_calibration via POST /v1/preflight

Belief-injection drift (BWDT)

Belief-Weighted Drift Tolerance. Belief modulates drift sensitivity but never erases it; multi-axis guards prevent Belief Inflation Attacks. Hard escapes for provenance-complete and freshness-ok cases.

Validator: BWDT module integrated in _preflight_internal() via POST /v1/preflight

Sleeper detection

Background scan that surfaces dormant attack patterns embedded in memory state. Runs on a 60-minute cadence in addition to per-call preflight.

Validator: Sleeper-scan scheduler in api/main.py

What Sgraal does NOT replace

Sgraal is a memory-decision validation layer. It does not — and is not designed to — replace any of the following. If you are not running these elsewhere in your stack, Sgraal does not fill the gap.

×Authentication. Use OAuth, API keys, mTLS, etc. Sgraal authenticates the API caller via Bearer token but does not authenticate the end user behind your agent.
×Prompt injection at LLM call time. Use input sanitization upstream of your model. Sgraal evaluates memory state, not the live prompt arriving at the LLM.
×Application-level authorization. Use your own RBAC. Sgraal returns a recommended action; the application is responsible for enforcing whether the calling agent or user is permitted that action.
×Data encryption at rest. Use the standard primitives of your storage layer. Sgraal does not store memory_state; it processes it in real time and discards.
×Network isolation. Use VPC, firewall rules, private endpoints. Sgraal is reached over HTTPS like any other API.
×OS-level audit logging. Use auditd, eBPF, or the equivalent. Sgraal logs decisions, not host syscalls.
×Secrets management. Use Vault, AWS Secrets Manager, etc. Sgraal does not store your application secrets.
×DDoS protection. Use Cloudflare, AWS Shield, etc. Sgraal has tenant-level rate limiting against quota abuse, not network-layer DDoS mitigation.

What Sgraal complements

Sgraal is most useful when paired with the existing observability and audit primitives in your stack.

+SIEM. Sgraal emits decision events (USE_MEMORY / WARN / ASK_USER / BLOCK) with structured metadata. SIEM correlates them with the rest of your security telemetry.
+OpenTelemetry tracing. Sgraal preflight calls show up as spans in your distributed-trace view, with the recommended action and the dominant risk component as span attributes.
+Application monitoring. Sgraal exposes preflight-decision metrics for tracking BLOCK rate, omega-score distribution, and detection-layer fire rates over time.
+Audit logging. Sgraal provides a decision_trail with bit-identical replay (POST /v1/replay) — the same input deterministically produces the same decision months later, given the same scoring configuration.

Compliance posture

This is the part of the document that customer-acquisition pressure will, unless explicitly resisted, push toward overclaiming. The honest statement:

What Sgraal supports

Sgraal supports four compliance profiles via dedicated API endpoints, plus a NIST AI RMF reference endpoint:

→ EU_AI_ACT — Article 9 (risk management), 12 (record-keeping), 13 (transparency), 14 (human oversight), 17 (quality management). GET /v1/compliance/eu-ai-act/declaration, GET /v1/compliance/eu-ai-act/report.
→ GDPR — data minimization, right-to-erasure, sub-processor disclosure, EU data processing. GET /v1/compliance/gdpr.
→ FDA_510K — medical-device software validation framing for substantial-equivalence claims.
→ HIPAA — PHI-integrity rule, tenant-isolated PHI handling.
→ NIST AI RMF — Govern / Map / Measure / Manage reference. GET /v1/compliance/nist-ai-rmf.

What Sgraal is NOT

Sgraal is NOT certified by SOC 2, ISO 27001, HIPAA, PCI-DSS, or any other external auditor as of 2026-05. Compliance support means the API generates evidence-shaped responses against the named frameworks; it does not mean Sgraal itself holds those certifications.

If your organization requires that the vendor itself hold a SOC 2 Type II report (or similar), Sgraal does not currently meet that requirement. We are happy to discuss the path to certification for prospective customers whose procurement process requires it; reach out to hello@sgraal.com.

How Sgraal has been validated so far

A single honest summary of the internal validation work that produced the detection layers above. Internal validation is what we have; external certification and paying-customer outcome data is what we are working toward.

Internal adversarial validation

Sgraal has been subjected to extensive internal adversarial testing across a sequence of synthetic corpora (R1 through R14), multi-model consensus reviews on key design decisions, code-level review cycles, and regression-prevention CI on every drift-prone public surface. These cycles produced the detection layers listed above and the honest-framing disclosures listed below.

This work is internal validation, not external certification. It establishes what the scoring engine catches under controlled adversarial conditions; it does not substitute for production traffic, third-party security review, or paying-customer outcome data. Those remain pending — see the calibration disclosures below for the explicit set of what external validation still needs to confirm.

What Sgraal does NOT yet claim about its scoring

Honest framing of what our scoring pipeline does and does not assert today. Each disclosure below is a known limit of the pre-paying-customer state; calibration tightens as production data accumulates.

Benchmark calibration

Our R12 and R14 corpora are synthetic adversarial sets. Production calibration against real customer outcomes is pending paying-customer onboarding.

Score consistency vs external causation

We measure internal decision consistency (NIST AI RMF MEASURE-3.1 style). External causal validity of our risk scores requires real-world outcome data, which we do not yet have.

First-call accuracy

For single-entry calls, several of our analytical modules use placeholder baselines until sufficient history accumulates. Multi-entry realistic calls (3+ entries) exercise the full pipeline.

Per-tenant calibration maturity

Thresholds default to global values until each tenant accumulates at least 20 outcome episodes per domain. Calibration tightens automatically with usage.

Cohort-relative scoring

Comparisons against industry peers (“your agent vs typical fintech”) require multiple production tenants in the same domain. We surface this as null until cohort threshold is met.

Internationalization

Our explain endpoint accepts a language parameter but currently renders English only. Other languages queued for a future release.

Versioning

Last reviewed: 2026-05-06.

This document is versioned in git at sgraal-ai/web-static. Previous revisions available via git log.

Related: /security · /comply · /docs