Threat Model
What Sgraal catches, what it doesn't, what it complements.
An honest map of where Sgraal sits in your security posture. Showing what we don't claim is the most useful thing we can publish.
What Sgraal preflight catches
Each item below maps to a validator that ships in the production scoring engine. Source links are to the public /v1/preflight endpoint, which runs all of these in a single synchronous pass.
Stale memory entries
Weibull-decayed freshness scoring per memory type. identity entries decay slowest; tool_state entries decay fastest. Stale entries lower s_freshness and contribute toward the recommended action.
Validator: scoring_engine/omega_mem.py via POST /v1/preflight
Source-trust mismatches
Provenance scoring weighted by per-source trust. Low-trust sources cited for high-stakes claims raise s_provenance risk. Aggregated across all entries in memory_state.
Validator: scoring_engine/owa_provenance.py, scoring_engine/omega_mem.py via POST /v1/preflight
Timestamp integrity
Detects timestamp forgery: old decisions disguised as fresh, content-age mismatch, fleet-age collapse, anchor inconsistency. Cannot be overridden by any subsequent reasoning step.
Validator: api/detection.py:_check_timestamp_integrity via POST /v1/preflight
Identity drift
Detects gradual authority escalation across agent hops, subject rebinding, confirmation erosion, permission-lattice violations.
Validator: api/detection.py:_check_identity_drift via POST /v1/preflight
Consensus collapse
Detects self-reinforcing false consensus from a single root source. Hedge-marker decay, confidence recycling, cross-role reinforcement, diversity collapse.
Validator: api/detection.py:_check_consensus_collapse via POST /v1/preflight
Multi-hop provenance asymmetry
Detects circular references, chain-length mismatches, compromised agents in the memory provenance path.
Validator: api/detection.py:_check_provenance_chain via POST /v1/preflight
Sync bleed
Detects cross-agent synchronization inconsistencies via timestamp inconsistencies and sync-event coherence checks.
Validator: api/detection.py:_check_sync_bleed via POST /v1/preflight
Confidence-age coherence
Detects mis-calibrated confidence on aged or low-provenance memory entries (high confidence on stale facts, etc.).
Validator: api/detection.py:_check_confidence_calibration via POST /v1/preflight
Belief-injection drift (BWDT)
Belief-Weighted Drift Tolerance. Belief modulates drift sensitivity but never erases it; multi-axis guards prevent Belief Inflation Attacks. Hard escapes for provenance-complete and freshness-ok cases.
Validator: BWDT module integrated in _preflight_internal() via POST /v1/preflight
Sleeper detection
Background scan that surfaces dormant attack patterns embedded in memory state. Runs on a 60-minute cadence in addition to per-call preflight.
Validator: Sleeper-scan scheduler in api/main.py
What Sgraal does NOT replace
Sgraal is a memory-decision validation layer. It does not — and is not designed to — replace any of the following. If you are not running these elsewhere in your stack, Sgraal does not fill the gap.
- ×Authentication. Use OAuth, API keys, mTLS, etc. Sgraal authenticates the API caller via Bearer token but does not authenticate the end user behind your agent.
- ×Prompt injection at LLM call time. Use input sanitization upstream of your model. Sgraal evaluates memory state, not the live prompt arriving at the LLM.
- ×Application-level authorization. Use your own RBAC. Sgraal returns a recommended action; the application is responsible for enforcing whether the calling agent or user is permitted that action.
- ×Data encryption at rest. Use the standard primitives of your storage layer. Sgraal does not store
memory_state; it processes it in real time and discards. - ×Network isolation. Use VPC, firewall rules, private endpoints. Sgraal is reached over HTTPS like any other API.
- ×OS-level audit logging. Use auditd, eBPF, or the equivalent. Sgraal logs decisions, not host syscalls.
- ×Secrets management. Use Vault, AWS Secrets Manager, etc. Sgraal does not store your application secrets.
- ×DDoS protection. Use Cloudflare, AWS Shield, etc. Sgraal has tenant-level rate limiting against quota abuse, not network-layer DDoS mitigation.
What Sgraal complements
Sgraal is most useful when paired with the existing observability and audit primitives in your stack.
- +SIEM. Sgraal emits decision events (USE_MEMORY / WARN / ASK_USER / BLOCK) with structured metadata. SIEM correlates them with the rest of your security telemetry.
- +OpenTelemetry tracing. Sgraal preflight calls show up as spans in your distributed-trace view, with the recommended action and the dominant risk component as span attributes.
- +Application monitoring. Sgraal exposes preflight-decision metrics for tracking BLOCK rate, omega-score distribution, and detection-layer fire rates over time.
- +Audit logging. Sgraal provides a
decision_trailwith bit-identical replay (POST /v1/replay) — the same input deterministically produces the same decision months later, given the same scoring configuration.
Compliance posture
This is the part of the document that customer-acquisition pressure will, unless explicitly resisted, push toward overclaiming. The honest statement:
What Sgraal supports
Sgraal supports four compliance profiles via dedicated API endpoints, plus a NIST AI RMF reference endpoint:
- → EU_AI_ACT — Article 9 (risk management), 12 (record-keeping), 13 (transparency), 14 (human oversight), 17 (quality management).
GET /v1/compliance/eu-ai-act/declaration,GET /v1/compliance/eu-ai-act/report. - → GDPR — data minimization, right-to-erasure, sub-processor disclosure, EU data processing.
GET /v1/compliance/gdpr. - → FDA_510K — medical-device software validation framing for substantial-equivalence claims.
- → HIPAA — PHI-integrity rule, tenant-isolated PHI handling.
- → NIST AI RMF — Govern / Map / Measure / Manage reference.
GET /v1/compliance/nist-ai-rmf.
What Sgraal is NOT
Sgraal is NOT certified by SOC 2, ISO 27001, HIPAA, PCI-DSS, or any other external auditor as of 2026-05. Compliance support means the API generates evidence-shaped responses against the named frameworks; it does not mean Sgraal itself holds those certifications.
If your organization requires that the vendor itself hold a SOC 2 Type II report (or similar), Sgraal does not currently meet that requirement. We are happy to discuss the path to certification for prospective customers whose procurement process requires it; reach out to hello@sgraal.com.
How Sgraal has been validated so far
A single honest summary of the internal validation work that produced the detection layers above. Internal validation is what we have; external certification and paying-customer outcome data is what we are working toward.
Internal adversarial validation
Sgraal has been subjected to extensive internal adversarial testing across a sequence of synthetic corpora (R1 through R14), multi-model consensus reviews on key design decisions, code-level review cycles, and regression-prevention CI on every drift-prone public surface. These cycles produced the detection layers listed above and the honest-framing disclosures listed below.
This work is internal validation, not external certification. It establishes what the scoring engine catches under controlled adversarial conditions; it does not substitute for production traffic, third-party security review, or paying-customer outcome data. Those remain pending — see the calibration disclosures below for the explicit set of what external validation still needs to confirm.
What Sgraal does NOT yet claim about its scoring
Honest framing of what our scoring pipeline does and does not assert today. Each disclosure below is a known limit of the pre-paying-customer state; calibration tightens as production data accumulates.
Benchmark calibration
Our R12 and R14 corpora are synthetic adversarial sets. Production calibration against real customer outcomes is pending paying-customer onboarding.
Score consistency vs external causation
We measure internal decision consistency (NIST AI RMF MEASURE-3.1 style). External causal validity of our risk scores requires real-world outcome data, which we do not yet have.
First-call accuracy
For single-entry calls, several of our analytical modules use placeholder baselines until sufficient history accumulates. Multi-entry realistic calls (3+ entries) exercise the full pipeline.
Per-tenant calibration maturity
Thresholds default to global values until each tenant accumulates at least 20 outcome episodes per domain. Calibration tightens automatically with usage.
Cohort-relative scoring
Comparisons against industry peers (“your agent vs typical fintech”) require multiple production tenants in the same domain. We surface this as null until cohort threshold is met.
Internationalization
Our explain endpoint accepts a language parameter but currently renders English only. Other languages queued for a future release.
Versioning
Last reviewed: 2026-05-06.
This document is versioned in git at sgraal-ai/web-static. Previous revisions available via git log.