Skip to content

henrino3/ctrl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTRL — Close The Running Loop

Close The Running Loop

A practical guide to Peter Steinberger's "Close the Loop" methodology for autonomous AI agent coding — with a real experiment proving it works.

Peter explaining Close the Loop

What is "Close the Loop"?

The idea is simple: give your coding agent the ability to verify its own work.

Peter Steinberger (creator of OpenClaw/Clawdbot, founder of PSPDFKit) ships code he doesn't read. He runs 3-8 AI coding agents in parallel and merged 600 commits in a single day. The secret? Every agent writes tests, runs them, fixes failures, and only reports back when everything passes. No human in the loop for verification.

"Code works well with AI because it's verifiable. You can compile it, run it, test it. That's the loop. You have to close the loop." — Peter Steinberger

The Method

4 Files That Make It Work

File Purpose Size in OpenClaw
AGENTS.md Build, test, and dev commands 21KB
TESTING.md What to test, what NOT to test, how to test 19KB
copilot-instructions.md Anti-redundancy rules, colocated test pattern 2KB
package.json Scripts: test, test:coverage, test:e2e -

4-Layer Testing Pyramid

Start at the bottom. Each layer adds realism — and cost.

         ┌──────────┐
         │  Docker   │  Full app in container, cold-start
         │  Tests    │  Slowest, most realistic
         ├──────────┤
         │   Live    │  Real API calls (Stripe, Anthropic, etc.)
         │  Tests    │  Costs money, flaky (network/rate limits)
         ├──────────┤
         │   E2E     │  API routes, webhooks, request/response
         │  Tests    │  Minutes, catches integration bugs
         ├──────────┤
         │   Unit    │  Pure logic, validation, security
         │  Tests    │  Seconds, catches 80% of bugs
         └──────────┘
Layer What it tests Speed Cost When to use
Unit Pure functions, logic, validation, security <1s Free Every commit
E2E API routes, webhooks, auth flows, DB queries 1-5 min Free Before push
Live Real third-party APIs (Stripe, OpenAI, etc.) 5-30 min $ (API calls) Before release
Docker Full app cold-start in clean container 5-10 min Free CI/CD pipeline

Where to start: Unit tests. They give you 80% of the value at <1% of the cost. Peter's OpenClaw has 1,376 test files and unit tests are the foundation.

When to add E2E: When you have API routes that handle auth, data mutation, or external integrations. Tests like "does POST /api/tasks return 201?" catch a different class of bugs than unit tests.

When to add Live: Only when you integrate with third-party APIs and need to verify they haven't changed their format/behavior. These are expensive and flaky — run them manually or before releases, not on every commit.

When to add Docker: When you deploy to production and want to verify the entire app starts cleanly from scratch. Good for CI/CD pipelines.

The Rules

  1. Colocated tests — Every source file gets a .test.ts right next to it
  2. Close the loop — After writing code, run tests. If they fail, fix. Don't ask the human.
  3. Full gate before pushbuild + lint + test must all pass
  4. Anti-redundancy — Search for existing helpers before creating new ones

Evidence From OpenClaw's Codebase

We analyzed the OpenClaw GitHub repo:

  • 1,376 test files across the codebase
  • 563 source files with colocated .test.ts files
  • Largest test files are 45-65KB — real tests, not stubs
  • Recent commits consistently pair source changes with test changes

The Prompt

Here's a ready-to-use prompt for your coding agent (adapt the stack):

You are working on my project which has a [YOUR STACK].
We currently have zero tests.

I want to implement Peter Steinberger's "Close the Loop" methodology, where the agent
(you) can test and verify your own work autonomously via CLI commands, without needing
a human to check every change.

TESTING STRUCTURE
- Write a test file for every file you create or modify
- Colocate tests next to the source file they test
- Name them: source.test.ts next to source.ts
- Start with unit tests only

CLOSE THE LOOP
- After writing any code, run the tests via CLI before considering the task done
- If tests fail, fix the code and run again — do not ask me to check
- Only report back when tests are passing

COMMANDS TO RUN
- After writing code: run tests, linter, type checker
- Before any PR: run full gate (build + lint + test)
- Never push failing code

ANTI-REDUNDANCY
- Before creating any helper or utility, search for existing ones first
- If a function already exists, import it — do not duplicate it
- Extract shared test fixtures into test-helpers files when used in multiple tests

Experiment 1: Does It Actually Work?

We didn't just implement this — we tested whether the tests actually catch bugs.

📄 Full Experiment Report →

Setup

  • Project: Entity (Next.js + TypeScript monorepo)
  • Test runner: Vitest
  • Modules tested: classify.ts (file classification) and security.ts (path security)
  • Total tests written: 64 across 3 files
  • Test execution time: <500ms

Method

  1. Wrote comprehensive tests for 2 modules (64 tests total)
  2. Verified all tests pass (baseline)
  3. Introduced 6 deliberate bugs — mix of logic inversions, missing cases, off-by-ones, and a critical security bypass
  4. Ran tests to see how many bugs get caught

Results

# Bug Type Caught?
1 blog type returns 'prd' Logic inversion
2 Henry agent detection removed Missing case
3 Tag filter >2>3 Off-by-one
4 Path traversal bypass Security critical
5 'secret' removed from redaction list Missing config
6 assertSourceEnabled inverted Logic inversion

Verdict: 5/6 bugs caught (83%) — STRONG PASS

The tests caught:

  • Both logic inversions
  • A missing code path
  • A critical security vulnerability (path traversal bypass)
  • A missing configuration entry

The one it missed: a subtle off-by-one where the test wasn't testing the boundary value. Lesson: test boundary values, not just happy paths.

Key Insights

  1. Speed matters. If tests take >1 minute, the loop is too slow. Our tests run in <500ms.
  2. Quality > Quantity. 64 tests missed one bug because we didn't test the boundary value. Edge cases are the critical differentiator.
  3. Colocated = discovered. When tests sit next to source files, agents naturally find and run them.
  4. Security bugs get caught. The most important bug in our experiment (path traversal bypass) was caught immediately.
  5. It's not magic. Close the loop works because code is verifiable. The agent can objectively check if its work is correct.

Origin

This methodology was researched after a conversation between Henry Mascot and Kinan Zayat, who reverse-engineered Peter Steinberger's approach by studying the OpenClaw codebase. Kinan identified the 4-file pattern and adapted it for his own Next.js stack.

References

License

MIT — use this however you want.

About

CTRL — Close The Running Loop: AI agent testing methodology with experiment proving 83% bug detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors