Foundations of Harness Engineering

The core articles that define the discipline. These pieces from OpenAI, Anthropic, Thoughtworks, and the practitioner community establish what harness engineering is, why it emerged, and how it changes the way we build with AI agents.

Harness Engineering: Leveraging Codex in an Agent-First World

OpenAI

OpenAI's field report on building a large-scale application with Codex. The key insight: when you move from one-shot code generation to a persistent agent working inside your codebase, the engineering shifts from "how to prompt" to "how to constrain." They describe using architectural constraints (enforced directory structure, forbidden patterns), repo-local instructions that persist across sessions, browser-based validation loops, and telemetry to understand where the agent struggles. The article demonstrates that building reliable agent systems is fundamentally an infrastructure problem -- the model is the easy part.

Key Takeaways

Read Original

Effective Harnesses for Long-Running Agents

Anthropic

Anthropic's definitive guide to making agents work across multiple context windows. Introduces the concept of "initializer agents" that set up the working environment before the main agent begins. Covers feature lists as a structured decomposition format, init.sh scripts that establish the build/test/lint cycle, self-verification patterns where the agent checks its own work, and handoff artifacts that preserve critical state across context window boundaries. The article argues that the harness is what makes the difference between an agent that produces a demo and one that builds production software.

Key Takeaways

Read Original

Harness Design for Long-Running Application Development

Anthropic

A follow-up focused on generating complete applications autonomously. Introduces a GAN-inspired generator/evaluator architecture where one agent builds and another grades. The evaluator applies concrete criteria to turn subjective judgments ("is this design good?") into gradable terms. Covers task state management across long sessions and why decomposing builds into tractable chunks with structured handoff artifacts dramatically improves completion rates.

Key Takeaways

Read Original

The Anatomy of an Agent Harness

LangChain

LangChain's concise decomposition of what constitutes an agent harness. Defines an agent as "model + harness" where the harness includes prompts, tools, middleware, orchestration logic, and runtime infrastructure. Distinguishes between the framework (reusable components), the runtime (execution environment), and the harness (the application-specific configuration that ties everything together). This framing helps practitioners understand that most agent failures are harness failures, not model failures.

Key Takeaways

Read Original

Harness Engineering

Thoughtworks

Thoughtworks frames harness engineering into three complementary activities: context engineering (what the agent knows), architectural constraints (what the agent is allowed to do), and "garbage collection" against entropy (cleaning up the mess that accumulates over long sessions). The article positions harness engineering as a new discipline that sits between traditional software engineering and AI/ML -- requiring both infrastructure skills and an understanding of model behavior.

Key Takeaways

Read Original

Building Effective Agents

Anthropic

Anthropic's broader guide covering the full spectrum from simple workflows to autonomous agents. Argues that structured workflows (chaining, routing, parallelization) should be preferred over unconstrained agents when the task is well-defined. Introduces patterns for tool use, handoff between specialized agents, and evaluation. The key message: start simple, add complexity only when needed, and always prefer deterministic control where possible.

Key Takeaways

Read Original

Skill Issue: Harness Engineering for Coding Agents

HumanLayer

A provocative argument that when coding agents produce weak results, the problem is almost always the harness, not the model. Reviews common failure modes -- context overflow, missing guardrails, no validation loop -- and shows how each is solved by harness infrastructure rather than model upgrades. Makes the case that investing in harness engineering gives better ROI than waiting for the next model release.

Key Takeaways

Read Original

Your Agent Needs a Harness, Not a Framework

Inngest

Inngest argues that agent frameworks often abstract away the wrong things. What agents actually need is infrastructure for state management, automatic retries, trace collection, and concurrency control. Compares the framework approach (hiding complexity) with the harness approach (making infrastructure visible and controllable). The conclusion: frameworks help you start; harnesses help you ship.

Key Takeaways

Read Original

Explore the Knowledge Base

Context Engineering Safety & Guardrails Specs & Workflows Evals & Observability Benchmarks Tools & Runtimes

Foundations of Harness Engineering Knowledge Base Summary

This page curates 8 foundational resources on harness engineering for AI coding agents. Topics covered:

Key principles: the harness determines production readiness more than the model, architectural constraints beat prompt instructions, prefer structured workflows and deterministic control, invest in infrastructure rather than waiting for better models, decompose work with handoff artifacts for long-running sessions.