BUILD LOG

Epic 57 — asking a second model to argue against the first

2026-06-10

The policy gates are good at the things you can write down. A whitelist knows which protocols are allowed; a supply cap knows the most that may be deposited; the kill-switch and the gas ceiling are blunt and reliable. What none of them can judge is whether a move makes sense — whether the thesis behind a deposit is plausible, whether the choice fits the current market regime, whether the agent has talked itself into something that is technically permitted and still a bad idea. Epic 57 adds a stage aimed squarely at that gap: a critic. After the agent proposes an action, a second model call receives that proposal and is asked to argue against it.

A second model arguing against the first catches a different class of error than the deterministic gates do. The gates check shape and limits; the critic checks judgment. Its output carries a severity, and the tier that matters is ERROR versus WARNING. An ERROR is surfaced at the approval gate where a human is asked to acknowledge it explicitly before approving — it interrupts. A WARNING is recorded alongside the decision but does not demand a separate acknowledgement. The split keeps the loud signal loud without burying it under every minor caveat the critic might raise.

The most important thing to be clear about is what the critic does not do: it does not veto. It flags. A human still approves or denies every move, and the critic's argument is one more input to that judgment, not a gate that stops the transaction on its own. We chose advise-over-veto deliberately — a model confident enough to block moves autonomously is a model you have to trust more than we are willing to trust one — but it does mean the critic is only as useful as the human reading it. It will miss things. It is not a safety guarantee, and we do not present it as one.

The build had its own small lesson. The critic returns structured JSON, and the Haiku model we used for the stage liked to wrap that JSON in a fenced code block, which broke the parser expecting raw output. The fix was unglamorous — tolerate the fence, strip it before parsing — but it is the kind of integration detail that decides whether a stage like this actually runs in production or quietly fails closed on a formatting quirk. It ships enabled across the profiles now.

Receipts & reasoning