Specs & Workflows — Harness Engineering Knowledge Base

AGENTS.md

github.com/agentsmd

AGENTS.md is a lightweight, open format that gives AI coding agents a dedicated, predictable place to learn about a project. Just as README.md orients human contributors, an AGENTS.md file orients machine contributors -- describing development environment setup, testing protocols, naming conventions, and contribution standards in plain Markdown. The format is intentionally minimal: drop one file in your repo root, write the sections that matter, and every agent that opens the project knows where to look. Because it uses the same Markdown developers already write, adoption requires no new tooling and the file stays maintainable alongside the rest of the documentation. The design philosophy is pragmatic over prescriptive: there is no rigid schema, no required toolchain, and no vendor lock-in. Projects are free to add monorepo navigation tips, linting commands, CI quirks, or domain-specific workflows. The repository includes a companion Next.js site and practical examples showing how different project types structure their files, from single-package libraries to large monorepos.

Key Takeaways

One Markdown file per repo -- no extra tooling, schema, or build step required
Covers dev setup, test commands, PR conventions, and project-specific quirks
Vendor-neutral: works with any AI coding agent that reads repo files
Flexible structure lets each project define only the sections it needs

Read Original

agent.md

github.com/agentmd

Where different AI coding tools each invented their own rules file -- .cursorrules, .windsurfrules, .clauderules -- the agent.md project proposes a single, universal configuration standard. "One file, any agent" is the rallying cry: place an AGENT.md in your project root and every tool can parse the same instructions. The specification covers project structure, build and test commands, code style conventions, architecture decisions, security considerations, and testing guidelines. Crucially, it supports a hierarchical model: root-level files set global rules, subdirectory files override them for subsystems, and a user-global config at ~/.config/AGENT.md carries personal defaults everywhere. File references via @-mentions let you pull in additional context without duplicating content. For teams already maintaining tool-specific configs, migration commands create symlinks that keep legacy files working while consolidating to a single source of truth. The design prioritizes vendor neutrality, community governance, and the simplicity of Markdown -- making it a realistic candidate for cross-tool standardization rather than another proprietary format.

Key Takeaways

Replaces fragmented tool-specific config files with one universal Markdown format
Hierarchical: root, subdirectory, and user-global levels with override semantics
@-mentions for composable file references; symlinks for backward compatibility
Community-driven, vendor-neutral spec aiming to become the cross-tool default

Read Original

GitHub Spec Kit

github.com/github

Spec Kit is GitHub's open-source implementation of spec-driven development (SDD) -- a methodology where specifications are written first and directly drive code generation, rather than serving as afterthought documentation. The toolkit provides a CLI that scaffolds workspace configurations for multiple coding assistants and introduces a six-phase workflow: establish a project constitution of immutable principles, create a specification, produce a technical plan, break the plan into tasks, implement, and review. Each phase materializes as Markdown files enriched with checklists that track requirement completeness, constitution compliance, and research status. A powerful template system and 40+ community extensions support everything from greenfield React apps to brownfield Java migrations. At its core, Spec Kit embeds the conviction that "vibes-based coding" produces unreliable results and that strong, explicit specifications -- reviewed and refined by humans at every gate -- yield more predictable AI-generated code. Integration spans Claude, Copilot, Gemini, and OpenAI agents, plus project-management tools like Jira, Linear, and Azure DevOps.

Key Takeaways

Six-phase workflow: Constitution, Specify, Plan, Tasks, Implement, Review
Specs become the source of truth -- code is the "last-mile" artifact
40+ extensions, custom presets, multi-agent and multi-language support
Heavy use of checklists as machine-enforceable "definition of done"

Read Original

Understanding Spec-Driven Development: Kiro, spec-kit, and Tessl

martinfowler.com

This Thoughtworks article offers a critical, hands-on evaluation of three tools claiming the spec-driven development label -- Kiro, spec-kit, and the Tessl Framework -- and finds they implement quite different things. The author identifies three ascending levels of SDD: spec-first (write a spec, then code, then discard the spec), spec-anchored (keep the spec as a living artifact), and spec-as-source (only humans edit the spec, code is always generated). Kiro provides a lightweight three-step flow -- requirements, design, tasks -- that works well for tutorials but proved over-engineered for small bug fixes. Spec-kit produces extensive Markdown artifacts with research steps and checklists, but the volume of generated files can make review harder than reviewing code directly. Tessl goes furthest, aspiring to a per-file spec-as-source model reminiscent of model-driven development, trading the old rigid DSL for natural-language specs processed by LLMs. The article raises tough questions: Can one workflow fit all problem sizes? Does reviewing Markdown instead of code create a false sense of control? Are we making things worse in the attempt to make them better -- a case of "Verschlimmbesserung"?

Key Takeaways

Three levels of SDD: spec-first, spec-anchored, spec-as-source
Bigger specs do not automatically mean better control -- agents still ignore instructions
SDD tools must handle varying problem sizes; one-size-fits-all workflows are premature
Parallels to model-driven development warn against repeating historical mistakes

Read Original

12 Factor Agents

humanlayer.dev

Inspired by the original 12 Factor App manifesto, this post by Dex at HumanLayer distills hard-won lessons from building production-grade LLM-powered software into twelve operating principles. The central argument: most successful "AI agents" are not the autonomous loop-until-solved fantasies but rather mostly deterministic code with LLM steps sprinkled in at the right points. The factors advocate owning your prompts, context window, and control flow rather than outsourcing them to a framework. Tools are reframed as structured JSON outputs that trigger deterministic code -- not magic black-box function calls. Execution state and business state should be unified into a single serializable thread so agents can launch, pause, and resume with simple APIs. Human contact is modeled as just another tool call, enabling durable human-in-the-loop workflows across Slack, email, or SMS. Agents should be small and focused (under 20 steps), triggered from anywhere, and designed as stateless reducers where the entire behavior is a pure function of the accumulated context window. The manifesto has become a touchstone for teams building customer-facing agents that need to work reliably, not just demo well.

The 12 Factors

1. Natural language to tool calls
2. Own your prompts
3. Own your context window
4. Tools are just structured outputs
5. Unify execution state and business state
6. Launch / pause / resume with simple APIs
7. Contact humans with tool calls
8. Own your control flow
9. Compact errors into context window
10. Small, focused agents
11. Trigger from anywhere
12. Make your agent a stateless reducer

Read Original

12-Factor AgentOps

12factoragentops.com

Created by Boden Fuller as an operations-focused companion to the 12 Factor Agents manifesto, 12-Factor AgentOps shifts the lens from building agents to running them reliably in production. Where the original twelve factors address architecture and code design, AgentOps tackles the operational discipline required once agents hit real workloads: context management hygiene, input and output validation, reproducible workflow execution, structured logging and tracing, graceful degradation under failure, cost tracking per invocation, and deterministic replay for debugging. The framework insists that every agent invocation should be reproducible -- given the same context and tools, the same outcome should follow -- and that operators need first-class primitives for versioning prompts, pinning model snapshots, and auditing tool-call chains. Validation is treated as a continuous concern: inputs are sanitized before reaching the LLM, outputs are schema-checked before execution, and drift between expected and actual behavior triggers alerts. The site organizes its guidance into discrete, numbered factors with concrete checklists, making it practical for platform teams building internal agent infrastructure. Together with its sibling manifesto, it forms a complete build-and-operate playbook for production agent systems.

Key Takeaways

Operations companion to 12 Factor Agents -- focused on running, not just building
Every invocation should be reproducible with versioned prompts and pinned models
Continuous validation: sanitize inputs, schema-check outputs, alert on drift
Structured logging, cost tracking, and deterministic replay as first-class concerns

Read Original

Dynamic Workflows in Claude Code

anthropic.com

Written by Thariq Shihipar and Sid Bidasaria of Anthropic, this piece introduces dynamic workflows -- a Claude Code feature where the agent writes its own harness on the fly. Instead of the default coding harness, Claude (Opus 4.8+) generates a JavaScript file that spawns and coordinates fresh Claude instances through three primitives: agent() runs one subagent and can force schema-validated JSON output, parallel() fans tasks out behind a barrier, and pipeline() streams each item through every stage with no barrier. The keyword "ultracode" guarantees a workflow is built. The motivation is three failure modes that a single context window can't escape: agentic laziness (stopping at partial progress), self-preferential bias (an agent grading its own work too kindly), and goal drift (constraints eroding across turns and compaction). Orchestrating separate Claudes -- each with a clean context and one isolated goal -- removes the conflict of interest. The article names six composable patterns and a security-critical quarantine pattern for triage: read-only reader agents parse untrusted content into structured summaries, and only a high-privilege actor agent acts on those summaries, never the raw input. Workflows can route each subagent to a specific model (Haiku for simple steps, Opus for hard ones), run in isolated git worktrees, resume after interruption, and cap token spend directly in the prompt.

The Six Patterns

Classify-and-Act -- a classifier routes the task to the right agent or output format
Fan-out-and-Synthesize -- parallel agents per step, then a barrier merges structured outputs
Adversarial Verification -- a separate verifier checks each worker against a rubric
Generate-and-Filter -- generate many options, keep only the rubric-and-dedupe survivors
Tournament -- agents compete; pairwise judges compare until a winner emerges
Loop Until Done -- spawn until a stop condition is met, not a fixed number of passes

Read Original

Standard Comparison

What each specification format covers

Capability	AGENTS.md	agent.md	Spec Kit
Format	Single Markdown file	Markdown file(s)	CLI + Markdown artifacts
Dev environment setup	Yes	Yes	Via constitution
Testing instructions	Yes	Yes	Yes
Code style / conventions	Yes	Yes	Yes
Architecture / design	Optional	Yes	Yes
Hierarchical overrides	--	Root + subdir + global	Constitution + specs
File references / composition	--	@-mentions	Templates + scripts
Task generation / breakdown	--	--	Yes (phased workflow)
Code generation from specs	--	--	Yes (implement phase)
Multi-agent support	Any agent reads it	Cross-tool by design	Claude, Copilot, Gemini, OpenAI
Vendor lock-in	None	None	GitHub ecosystem
Tooling required	None (plain file)	Optional CLI for migration	CLI + workspace setup
Best for	Lightweight project guidance	Cross-tool config unification	Full spec-driven development pipeline

Explore the Knowledge Base

Foundations Context Engineering Safety & Guardrails Evals & Observability Benchmarks Tools & Runtimes

Curated from awesome-harness-engineering and original research.

Canonical URL: harn.app/kb/specs

Specs, Agent Files &Workflow Design

AGENTS.md

agent.md

GitHub Spec Kit

Understanding Spec-Driven Development: Kiro, spec-kit, and Tessl

12 Factor Agents

12-Factor AgentOps

Dynamic Workflows in Claude Code

Standard Comparison

Explore the Knowledge Base

Specs and Workflows -- Machine-Readable Summary

Specs, Agent Files &
Workflow Design