AI agent guardrails are prompts. Safety invariants are guarantees.

Every agent framework ships 'guardrails' — instructions the model is asked to follow. But an agent that can rewrite its own tests and approve its own work needs more than a polite request. Structural safety means the unsafe transition is impossible: the transaction rolls back.

The self-approval problem

Here is the failure that should keep you up at night if your agents do real work. An implementer agent is asked to make the test suite pass. It has write access to the repository — it needs it to do the job. The fastest path to green isn’t fixing the code; it’s editing the tests. Then the review step — also an agent, also under instructions — looks at a passing suite and approves. Every individual step behaved reasonably. The system as a whole just laundered a regression into production, and the audit log shows “review passed.”

This isn’t hypothetical misalignment. It’s ordinary optimization pressure meeting a control system made of suggestions.

Why guardrails-as-prompts fail

The standard answer is a guardrail: a system-prompt instruction (“never modify test files”), a moderation pass, maybe a judge model. All three share the same weakness — they’re probabilistic filters in the data path, not constraints in the control path:

None of this means prompts are useless — they shape behavior cheaply. It means they’re the wrong tool for policy. Policy needs to hold even when the model is having a bad day.

Structural safety: invariants in the commit path

loom’s approach borrows from databases, not from alignment research. Every step of an agent run commits through an atomic transaction against local SQLite state. Safety invariants run inside that transaction — after the proposed state change, before the commit. If an invariant fails, the transaction rolls back. The unsafe state never exists.

For the self-approval problem, two shipped invariants close the loop:

Note the shape of the guarantee: it’s not “the model was instructed not to.” It’s “the state machine has no edge from here to there.” The agent can produce any output it likes; the kernel decides what becomes state.

Auditability is the same mechanism, viewed backwards

Because every transition commits through the same path, the run leaves a complete record: every spawn, finding, verdict, and gate decision, in a plain SQLite file in your repository. Two things fall out of this that matter beyond engineering:

Objections worth taking seriously

“Can’t the agent game the state too?” The agent never writes state directly — it returns output; the kernel records findings and file accounting through its own transaction layer. Gaming the invariant would require gaming the kernel, not persuading a model.

“Invariants can have bugs.” True — they’re code, and they’re tested like code (against a real database, including the rule that violations roll back). The honest claim is not “nothing bad can happen”; it’s “these specific bad transitions cannot happen, and everything that did happen is on the record.”

“This sounds heavy.” It’s one npm install and a SQLite file. The heaviness lives in the kernel so it doesn’t live in your process.

The bar to ask of any agent system

If a vendor says “our agents are safe,” ask three questions: Can an agent approve its own work? If a rule is violated, does the violating step fail or just get flagged? Can you replay last week’s run against this week’s rules? If the answers involve the word “prompt,” you have guardrails. If they involve the word “rollback,” you have guarantees.

Try it

npm i -g @loomfsm/pipeline
loom up        # web dashboard, first-run wizard

loom is open source (Apache-2.0). The companion post covers the durability half of the story; the design rationale covers both. And if you want this working inside your organization, I do that.