Generation 3 of AI Interaction

Stop Prompting.
Start Harnessing.

The underlying AI model matters less than the system built around it. Build the environment, toolchains, and safety constraints where your AI agent operates.

What does /harn:init actually do?

It scaffolds three shell scripts and one config file. Together, they make your AI agent stop breaking things.

WITHOUT HARN
  • Agent runs rm -rf src/
  • Agent pushes broken code to main
  • Agent finishes with 47 type errors
  • Agent forgets your rules after 10 min
  • Agent loops forever on Stop hooks
WITH HARN
  • Dangerous commands blocked before execution
  • Push to main intercepted and denied
  • Agent can't finish until types pass
  • AGENTS.md stays lean — skills load on demand
  • Loop prevention baked into every hook

Files created by /harn:init

AGENTS.md

Lean control document (~60 lines). Tells the agent your stack, constraints, and where to find detailed rules. No bloat.

security_guard.py

PreToolUse hook. Intercepts every shell command. Blocks rm -rf, push main, chmod 777, pipe-to-shell, and more.

quality_gate.sh

Stop hook. Runs your type-checker (tsc/cargo/go vet/mypy) before the agent can finish. Broken code = blocked.

.claude/settings.json

Wires everything together. Maps hooks to lifecycle events (PreToolUse, PostToolUse, Stop). Zero manual config.

"A well-built outer harness increases the probability that the agent gets it right in the first place, and provides a feedback loop that self-corrects as many issues as possible before they even reach human eyes."

The Bottleneck Shift: How We Learned to Control AI

A visual explainer of the three generations of AI interaction

Harness Engineering: The New Paradigm for Reliable AI Agents — infographic showing three evolutions of AI interaction (Prompt Engineering, Context Engineering, Harness Engineering) and the core pillars (Guides vs Sensors, Context Firewall, Deterministic Control Flow)

The Three Evolutions

How we went from talking to AI to building systems around it

2022–2023

Prompt Engineering

Point of influence: Wording

Focusing on phrasing and "diplomacy" to get the best one-shot response from a model.

"How do I phrase this so the AI understands?"

2024–2025

Context Engineering

Point of influence: Information

Dynamically managing the information and documents the AI sees within its limited context window.

"What does the AI need to know to do this work?"

2026+

Harness Engineering

Point of influence: System / Environment

Building the environment, toolchains, and safety constraints where the AI operates autonomously.

"What environment ensures the AI works safely?"

The C.A.R. Framework

Every agentic system has three layers

Control

Constraints & Specifications

Don't ask the agent to write good code — mechanically enforce what good code looks like.

  • AGENTS.md
  • Linters & type checkers
  • Tests as contracts
  • Architectural rules

Agency

Tools & Interfaces

Define how the model is allowed to act — scope its tools, not just its knowledge.

  • MCP servers
  • Sub-agent delegation
  • Browser / GUI access
  • Permission boundaries

Runtime

Execution & Recovery

Govern how work unfolds over time — deterministic hooks as reflex arcs.

  • Lifecycle hooks
  • Retry & rollback
  • Context compaction
  • Trace collection

The Core Pillars of a Robust Harness

Three patterns that make the difference

Guides vs. Sensors

Use feedforward Guides to steer behavior and feedback Sensors (linters/tests) for self-correction.

Guide = AGENTS.md tells the agent what to do
Sensor = Type checker tells the agent what went wrong

The Context Firewall

Use sub-agents to isolate tasks, preventing "context rot" from accumulating irrelevant tool-call noise.

Delegate research to a sub-agent — only the summary returns to the parent context

Deterministic Control Flow

Implement Hooks (reflex arcs) that automatically trigger security checks or formatters without AI intervention.

PreToolUse — blocks rm -rf
PostToolUse — auto-formats
Stop — type-checks before done

Why Your Agent Failed

Common failures and their harness solutions

Problem: Your Stop hook runs a type-checker. It fails. The agent tries to fix it. The Stop hook fires again. Fails again. Forever.

Fix: Check the stop_hook_active flag in the JSON payload. If the agent is already recovering from a Stop hook failure, let it through.

File: scripts/harness/quality_gate.sh   Hook: Stop

quality_gate.sh
#!/usr/bin/env bash
# Read the hook payload from stdin
PAYLOAD=$(cat /dev/stdin)
IS_ACTIVE=$(echo "$PAYLOAD" | jq -r '.stop_hook_active // false')

# CRITICAL: If already in recovery, exit clean to break the loop
if [ "$IS_ACTIVE" = "true" ]; then
  exit 0
fi

# Run your actual check
npx tsc --noEmit > /tmp/gate.log 2>&1
if [ $? -ne 0 ]; then
  echo "QUALITY GATE FAILED. Fix these errors:" >&2
  cat /tmp/gate.log >&2
  exit 2  # Block completion — agent must fix first
fi
exit 0

Wire it: .claude/settings.json"hooks"{"event": "Stop", "command": "bash scripts/harness/quality_gate.sh"}

Problem: The agent ran rm -rf on your source directory, or git push origin main with untested code. No guardrail stopped it.

Fix: A PreToolUse hook intercepts every Bash command before execution. If it matches a dangerous pattern, exit 2 blocks it.

File: scripts/harness/security_guard.py   Hook: PreToolUse (match: Bash)

security_guard.py
#!/usr/bin/env python3
import sys, json, re

payload = json.load(sys.stdin)
if payload.get("tool_name") != "Bash":
    sys.exit(0)  # Only guard shell commands

command = payload.get("parameters", {}).get("command", "")

BLOCKED = [
    r"rm\s+-r[fF]",           # Recursive forced deletion
    r"git\s+push\s+.*main",   # Push to main branch
    r"git\s+push\s+.*master", # Push to master branch
    r"chmod\s+777",           # Reckless permissions
    r"curl\s+.*\|\s*(?:sudo\s+)?(?:bash|sh)",  # Pipe to shell
    r">\s*~/",                # Overwrite home dotfiles
]

for pattern in BLOCKED:
    if re.search(pattern, command):
        print(f"BLOCKED: {pattern}", file=sys.stderr)
        sys.exit(2)  # Block execution

sys.exit(0)  # Allow

Wire it: "hooks"{"event": "PreToolUse", "match": "Bash", "command": "python3 scripts/harness/security_guard.py"}

Problem: Your 500-line AGENTS.md worked for the first task. By the third task, the agent forgot half of it. Performance degrades as irrelevant context fills the window.

Fix: Progressive Disclosure. Keep AGENTS.md under 60 lines. Extract domain rules into skill files that load on demand when relevant.

File: AGENTS.md (max 60 lines) + skills/api-reviewer/SKILL.md (loaded on demand)

AGENTS.md — lean root file
# Agent North Star
Write reliable code within architectural boundaries.

## Stack
React + TypeScript + Express

## Constraints
- Never push to main
- UI must not access DB directly — use services/ layer
- If stuck 3 times on same error, ask the human

## Active Hooks
- PreToolUse: security guard blocks dangerous commands
- Stop: type-checker must pass before completion

## Skills (load when relevant)
- API work → load skills/api-reviewer
- DB work → load skills/db-schema

Rule of thumb: if you can't read your AGENTS.md in 30 seconds, it's too long. The agent feels the same way.

Problem: You woke up to a massive PR full of broken code. The agent finished the task, but tsc shows 47 errors and 3 tests fail.

Fix: A Stop hook acts as back-pressure. The agent cannot complete until the type-checker and tests pass. It's forced to fix its own mess.

File: scripts/harness/quality_gate.sh   Hook: Stop

.claude/settings.json — wire all three hooks
{
  "hooks": {
    "PreToolUse": [{
      "match": "Bash",
      "command": "python3 scripts/harness/security_guard.py"
    }],
    "PostToolUse": [{
      "match": "Edit|Write",
      "command": "npx prettier --write $FILE_PATH"
    }],
    "Stop": [{
      "command": "bash scripts/harness/quality_gate.sh"
    }]
  }
}

The PostToolUse hook auto-formats every file edit. The Stop hook catches type errors. Together they produce clean, passing code.

Problem: After 30 minutes, the agent starts repeating itself, forgetting constraints, and producing lower-quality output. The context window is full of grep results, file reads, and intermediate thinking.

Fix: Context Firewall via sub-agents. Delegate research, grepping, and exploration to sub-agents. Only the condensed answer returns to the parent — not the 500 lines of grep output.

Example: sub-agent delegation pattern
# In AGENTS.md, instruct the agent:
## Context Management
- For codebase searches: delegate to a sub-agent
- For research tasks: delegate to a sub-agent
- Sub-agents return ONLY the answer (max 200 words)
- Never paste raw grep output into main context

# The sub-agent gets a fresh context window.
# It searches, processes, and returns only:
# "The function is in src/auth/login.ts:42"
# NOT the 200 lines of grep results.

Think of it like this: you don't read every email in the company — your assistant summarizes them for you. Sub-agents are that assistant for context.

Get Started in 30 Seconds

Install the plugin, scaffold your harness

1 Add marketplace
2 Install
3 Scaffold
Terminal
# Step 1: Add the marketplace
claude plugins marketplace add sliday/claude-plugins

# Step 2: Install harn
claude plugins install harn

# Step 3: In your project, run:
/harn:init

Works with Claude Code. Support for other agents coming soon.

What is Harness Engineering?

Harness Engineering is the practice of building extra-model infrastructure — the environment, toolchains, and safety constraints — that channels an AI coding agent's power, defines its constraints, and verifies its work. It is the third generation of AI interaction, following Prompt Engineering (2022-2023) and Context Engineering (2024-2025).

The CAR Framework

CAR stands for Control, Agency, Runtime. Control defines constraints and specifications (AGENTS.md, linters, tests). Agency defines tools and interfaces (MCP servers, sub-agents). Runtime governs execution and recovery (hooks, retries, rollback).

Key Principles

Installation

Install via Claude Code: claude plugins marketplace add sliday/claude-plugins && claude plugins install harn. Then run /harn:init in your project.