Engineering · April 2026

Memory Reliability Engineering — The Weibull Model

The same math that predicts equipment failure now predicts memory failure.

What is the Weibull Distribution?

The Weibull distribution is the workhorse of reliability engineering. Introduced by Waloddi Weibull in 1951, it models the probability of failure over time for everything from ball bearings to jet engines. Its key insight: different things fail at different rates, and the shape of that failure curve tells you something fundamental about the failure mode.

The distribution is defined by two parameters: a scale parameter (λ) that controls when failures become likely, and a shape parameter (k) that controls how sharply the failure rate increases. When k < 1, failures decrease over time (infant mortality). When k = 1, failures are random (exponential decay). When k > 1, failures increase over time (wear-out) — this is the regime that matters for memory.

The Formula

Sgraal computes freshness using the Weibull survival function:

s_freshness = exp( -(age / λ)k )

Where age is the memory's age in days, λ is the scale parameter (type-specific), and k is the shape parameter (type-specific).

Type-specific decay rates (fastest to slowest):

Memory Type Decay Rate Intuition
tool_state0.15 (fast)API responses go stale in hours
shared_workflow0.08Collaborative state drifts in days
episodic0.05Event memories fade over weeks
preference0.02User preferences shift over months
semantic0.01Facts change slowly
policy0.005Policies are revised infrequently
identity0.002 (near-permanent)Core identity rarely changes

Why This Matters

Most AI systems treat all memory as equally fresh or equally stale. A tool output from 2 hours ago is given the same weight as an identity fact stored 6 months ago. This is wrong.

A 2-hour-old API response might already be dangerously stale (stock prices, order statuses, server health). A 6-month-old identity fact ("The user prefers dark mode") is probably fine. Weibull decay captures this difference mathematically: the survival function drops rapidly for tool_state but barely moves for identity.

This means Sgraal can make the right call without human-written rules about "how old is too old." The math handles it.

Connection to Predictive BLOCK

The freshness score feeds directly into the Ω_MEM computation. When s_freshness drops below a critical threshold for enough entries, the overall risk score crosses the BLOCK boundary — and the agent is stopped before it acts on stale data.

This is predictive, not reactive. The Weibull model tells us when memory will fail, not just when it has failed. Combined with Sgraal's Kalman forecasting and CUSUM trend detection, the system can issue warnings hours before a BLOCK becomes necessary.

For Reliability Engineers

If you come from a reliability engineering background, you will recognize this pattern immediately. Sgraal treats AI agent memory the same way you treat physical assets:

  • 1.Failure modes. Each memory type has a characteristic failure mode (staleness, drift, conflict) with a known time profile.
  • 2.Survival analysis. The Weibull survival function gives the probability that a memory entry is still reliable at time t.
  • 3.Preventive maintenance. The repair_plan with REFETCH actions is exactly preventive maintenance — replacing components before they fail.
  • 4.MTTR. Sgraal computes Mean Time To Repair via Weibull estimation, just like equipment reliability programs.

The math is identical. The domain is new.