CTRL — Close The Running Loop

Close The Running Loop

A practical guide to Peter Steinberger's "Close the Loop" methodology for autonomous AI agent coding — with a real experiment proving it works.

What is "Close the Loop"?

The idea is simple: give your coding agent the ability to verify its own work.

Peter Steinberger (creator of OpenClaw/Clawdbot, founder of PSPDFKit) ships code he doesn't read. He runs 3-8 AI coding agents in parallel and merged 600 commits in a single day. The secret? Every agent writes tests, runs them, fixes failures, and only reports back when everything passes. No human in the loop for verification.

"Code works well with AI because it's verifiable. You can compile it, run it, test it. That's the loop. You have to close the loop." — Peter Steinberger

The Method

4 Files That Make It Work

File	Purpose	Size in OpenClaw
AGENTS.md	Build, test, and dev commands	21KB
TESTING.md	What to test, what NOT to test, how to test	19KB
copilot-instructions.md	Anti-redundancy rules, colocated test pattern	2KB
package.json	Scripts: `test`, `test:coverage`, `test:e2e`	-

4-Layer Testing Pyramid

Start at the bottom. Each layer adds realism — and cost.

         ┌──────────┐
         │  Docker   │  Full app in container, cold-start
         │  Tests    │  Slowest, most realistic
         ├──────────┤
         │   Live    │  Real API calls (Stripe, Anthropic, etc.)
         │  Tests    │  Costs money, flaky (network/rate limits)
         ├──────────┤
         │   E2E     │  API routes, webhooks, request/response
         │  Tests    │  Minutes, catches integration bugs
         ├──────────┤
         │   Unit    │  Pure logic, validation, security
         │  Tests    │  Seconds, catches 80% of bugs
         └──────────┘

Layer	What it tests	Speed	Cost	When to use
Unit	Pure functions, logic, validation, security	<1s	Free	Every commit
E2E	API routes, webhooks, auth flows, DB queries	1-5 min	Free	Before push
Live	Real third-party APIs (Stripe, OpenAI, etc.)	5-30 min	$ (API calls)	Before release
Docker	Full app cold-start in clean container	5-10 min	Free	CI/CD pipeline

Where to start: Unit tests. They give you 80% of the value at <1% of the cost. Peter's OpenClaw has 1,376 test files and unit tests are the foundation.

When to add E2E: When you have API routes that handle auth, data mutation, or external integrations. Tests like "does POST /api/tasks return 201?" catch a different class of bugs than unit tests.

When to add Live: Only when you integrate with third-party APIs and need to verify they haven't changed their format/behavior. These are expensive and flaky — run them manually or before releases, not on every commit.

When to add Docker: When you deploy to production and want to verify the entire app starts cleanly from scratch. Good for CI/CD pipelines.

The Rules

Colocated tests — Every source file gets a .test.ts right next to it
Close the loop — After writing code, run tests. If they fail, fix. Don't ask the human.
Full gate before push — build + lint + test must all pass
Anti-redundancy — Search for existing helpers before creating new ones

Evidence From OpenClaw's Codebase

We analyzed the OpenClaw GitHub repo:

1,376 test files across the codebase
563 source files with colocated .test.ts files
Largest test files are 45-65KB — real tests, not stubs
Recent commits consistently pair source changes with test changes

The Prompt

Here's a ready-to-use prompt for your coding agent (adapt the stack):

You are working on my project which has a [YOUR STACK].
We currently have zero tests.

I want to implement Peter Steinberger's "Close the Loop" methodology, where the agent
(you) can test and verify your own work autonomously via CLI commands, without needing
a human to check every change.

TESTING STRUCTURE
- Write a test file for every file you create or modify
- Colocate tests next to the source file they test
- Name them: source.test.ts next to source.ts
- Start with unit tests only

CLOSE THE LOOP
- After writing any code, run the tests via CLI before considering the task done
- If tests fail, fix the code and run again — do not ask me to check
- Only report back when tests are passing

COMMANDS TO RUN
- After writing code: run tests, linter, type checker
- Before any PR: run full gate (build + lint + test)
- Never push failing code

ANTI-REDUNDANCY
- Before creating any helper or utility, search for existing ones first
- If a function already exists, import it — do not duplicate it
- Extract shared test fixtures into test-helpers files when used in multiple tests

Experiment 1: Does It Actually Work?

We didn't just implement this — we tested whether the tests actually catch bugs.

📄 Full Experiment Report →

Setup

Project: Entity (Next.js + TypeScript monorepo)
Test runner: Vitest
Modules tested: classify.ts (file classification) and security.ts (path security)
Total tests written: 64 across 3 files
Test execution time: <500ms

Method

Wrote comprehensive tests for 2 modules (64 tests total)
Verified all tests pass (baseline)
Introduced 6 deliberate bugs — mix of logic inversions, missing cases, off-by-ones, and a critical security bypass
Ran tests to see how many bugs get caught

Results

#	Bug	Type	Caught?
1	`blog` type returns `'prd'`	Logic inversion	✅
2	Henry agent detection removed	Missing case	✅
3	Tag filter `>2` → `>3`	Off-by-one	❌
4	Path traversal bypass	Security critical	✅
5	'secret' removed from redaction list	Missing config	✅
6	`assertSourceEnabled` inverted	Logic inversion	✅

Verdict: 5/6 bugs caught (83%) — STRONG PASS ✅

The tests caught:

Both logic inversions
A missing code path
A critical security vulnerability (path traversal bypass)
A missing configuration entry

The one it missed: a subtle off-by-one where the test wasn't testing the boundary value. Lesson: test boundary values, not just happy paths.

Key Insights

Speed matters. If tests take >1 minute, the loop is too slow. Our tests run in <500ms.
Quality > Quantity. 64 tests missed one bug because we didn't test the boundary value. Edge cases are the critical differentiator.
Colocated = discovered. When tests sit next to source files, agents naturally find and run them.
Security bugs get caught. The most important bug in our experiment (path traversal bypass) was caught immediately.
It's not magic. Close the loop works because code is verifiable. The agent can objectively check if its work is correct.

Origin

This methodology was researched after a conversation between Henry Mascot and Kinan Zayat, who reverse-engineered Peter Steinberger's approach by studying the OpenClaw codebase. Kinan identified the 4-file pattern and adapted it for his own Next.js stack.

References

License

MIT — use this however you want.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
experiments		experiments
README.md		README.md
experiment-design.md		experiment-design.md
peter-close-the-loop.gif		peter-close-the-loop.gif
research.md		research.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTRL — Close The Running Loop

What is "Close the Loop"?

The Method

4 Files That Make It Work

4-Layer Testing Pyramid

The Rules

Evidence From OpenClaw's Codebase

The Prompt

Experiment 1: Does It Actually Work?

Setup

Method

Results

Verdict: 5/6 bugs caught (83%) — STRONG PASS ✅

Key Insights

Origin

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CTRL — Close The Running Loop

What is "Close the Loop"?

The Method

4 Files That Make It Work

4-Layer Testing Pyramid

The Rules

Evidence From OpenClaw's Codebase

The Prompt

Experiment 1: Does It Actually Work?

Setup

Method

Results

Verdict: 5/6 bugs caught (83%) — STRONG PASS ✅

Key Insights

Origin

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages