How to Build a Self-Verification Loop in Claude Code (3 Layers, 20 Minutes)
Claude Code's Stop hook blocks the agent from finishing until verification passes. Combine it with...
Full article excerpt tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3878878) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } ShipWithAI Posted on Apr 28 • Originally published at shipwithai.io How to Build a Self-Verification Loop in Claude Code (3 Layers, 20 Minutes) #claude #ai #productivity #programming Harness engineering (5 Part Series) 1 Harness Engineering: Why the System Around AI Matters More Than the AI Itself 2 Beyond CLAUDE.md: 5 Layers Your AI Agent Harness Is Missing 3 Your CLAUDE.md Is an Instruction File. It Should Be a Failure Log. 4 The Constraint Paradox: Why Less AI Freedom Produces Better Code 5 How to Build a Self-Verification Loop in Claude Code (3 Layers, 20 Minutes) Claude Code's Stop hook blocks the agent from finishing until verification passes. Combine it with PostToolUse feedback injection to build a 3-layer verification loop (syntax, intent, regression) in 20 minutes. The result: the agent can't say "done" until it actually is. Two hook setups. Same Claude Code session. Different outcomes: # What most devs have: a formatting hook # PostToolUse: runs prettier after file edits # What this post builds: a verification loop # PostToolUse: checks syntax on every file change # Stop: blocks completion until tests pass + intent verified # Result: agent can't say "done" until it actually is Enter fullscreen mode Exit fullscreen mode The first catches formatting. The second catches logic errors, missed requirements, and broken tests before the agent claims it's finished. LangChain's PreCompletionChecklistMiddleware is the most documented example of this pattern. It contributed to a 13.7-point benchmark gain using harness changes alone. This post builds the Claude Code equivalent using hooks. What does "verification" actually mean for an AI coding agent? Verification means checking that the agent's output matches the task's intent, not just that the code compiles. Only 3% of developers report high trust in AI-generated code (Qodo, State of AI Code Quality, 2025). Most developers stop at syntax checks (lint, format, type-check). Production verification needs two more layers. Three verification layers, each catching a different class of failure: Layer Checks Catches Misses Hook 1. Syntax Code compiles, formats Typos, type errors Logic bugs PostToolUse command 2. Intent Output matches request Wrong approach, missing features Regressions Stop prompt/agent 3. Regression Existing tests pass Broken functionality, side effects Untested requirements Stop command "Run the tests" only covers Layer 3. Tests verify what you wrote tests for, not what you asked the agent to do. If you asked Claude to add pagination and it added sorting instead, every test still passes. Layer 2 catches that. Spotify's Honk system demonstrates this at scale: 1,500+ PRs merged through verification loops, handling roughly 50% of all PRs automatically (Spotify Engineering, Dec 2025). Their key design choice: the agent doesn't know how verification works. It just gets pass/fail feedback. That separation keeps the agent focused on the task, not on gaming the verifier. How does Claude Code's Stop hook work? The Stop hook fires every time Claude finishes responding. Exit code 2 blocks Claude from stopping and forces it to continue working. This single mechanism prevents the…
This excerpt is published under fair use for community discussion. Read the full article at DEV Community.