Why loom

LLM agents are powerful and unreliable. The industry’s answer has been better prompts, bigger graphs — and lately, durable execution everywhere. loom’s answer goes one layer further: make the run a transaction, and make its safety structural.

The problem

Multi-step agent work fails in ways single prompts don’t: a crash halfway leaves half-applied steps; a retry double-spends; an agent quietly skips the review you asked for; and when something goes wrong, there’s no record of which decision produced the damage. For throwaway tasks that’s fine. For work where being wrong is expensive — touching production code, migrations, anything review-gated — it isn’t.

loom’s answer: three mechanisms

1. Replay-determinism

One timestamp token is captured per state-machine tick and threaded through every kernel call, persisted, and replayed verbatim. Combined with atomic SQLite transactions, the same (state, timestamp, ledger) produces the same trajectory. You can replay a recorded run against a changed invariant and ask “would this rule have caught it?”

2. Commit-time invariants

Safety rules run inside the database transaction and roll it back on violation. They’re not prompt suggestions — they’re structural. The code bundle ships rules like “acceptance can’t pass while a blocking finding is open” and “if an agent touched the tests, the final gate must be human-approved”.

3. The idempotency ledger

Every effect is recorded in a ledger row committed in the same transaction as the state change it dedupes. Crash recovery is therefore trivial and exact: restart, and the ledger silently absorbs every step that already happened. No double work, no double spend.

Compared to the alternatives

Agent frameworks
(LangGraph, CrewAI…)
Workflow engines
(Temporal, Inngest…)
loom
Built for authoring agent graphs durable service workflows review-gated agent work
Replay-deterministic runs no yes (workflow code) yes (whole run, incl. agent steps)
Safety enforced at commit time prompt-level n/a invariants inside the transaction
Human gates as a primitive callbacks you wire up signals you wire up first-class, policy-driven dial
Infrastructure your process a cluster / a cloud service one SQLite file in your repo
Vendor coupling varies none zero-dependency kernel, no vendor names

The honest comparison: if you’re building a custom agent product, a framework gives you more authoring surface. If you’re orchestrating microservices, Temporal is the right tool. loom sits in between — “Temporal for LLM agents”, local-first, with human-in-the-loop and provable process as the primitives.

A platform: pluggable on three orthogonal axes

loom was designed as a platform from day one, not as a code tool that grew plugins. The kernel is generic — it knows nothing about code review or any domain, contains zero runtime dependencies, and no vendor, model, or transport names (enforced by CI greps). Three axes plug into it:

What loom doesn’t claim

Early-stage, in the open. loom is v0.3: used daily by its author, stable at the core, still moving at the edges. Reading the whitepaper is the fastest way to decide if the design philosophy fits how you work.