Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems
UIUC / Meta / StanfordThe field-defining 2026 survey. Ning, Tieu, Fu and co-authors reframe code from a generated artifact into the operational substrate of agentic AI — the medium through which agents reason, act, observe, and verify. Organizes the literature into three connected layers: harness interface (code for reasoning, acting, and environment modeling), harness mechanisms (planning, memory, tool use, control through the Plan-Execute-Verify loop, and harness optimization), and scaling the harness (multi-agent orchestration over shared code-centric substrates). Closes with a research agenda: harness-level evaluation beyond task success, self-evolving harnesses without regression, transactional shared state, human-in-the-loop as durable harness state, and multimodal code-harness systems. This is the academic spine of everything else on this page.
Key Takeaways
- Reliable harnesses share four properties: executable, inspectable, stateful, governed
- Plan-Execute-Verify unifies planning, execution, and debugging as one cybernetic loop
- Agentic Harness Engineering (AHE) treats the harness itself as an object of optimization, edited by an Evolution Agent under governed mutation
- Multi-agent systems converge faster when the shared substrate is executable (tests, repos, traces) rather than implicit conversation history
- Open problem: evaluators that capture intended task, not just executable proxies (oracle adequacy)