Context Engineering

Context management is the single most important factor in agent reliability. An AI agent's context window is its working memory -- every token counts. These resources cover the emerging discipline of curating, compressing, and structuring the information that flows through an agent's limited attention budget.

Effective Context Engineering for AI Agents

Anthropic

Anthropic's Applied AI team frames context engineering as the natural evolution of prompt engineering. Where prompt engineering focused on crafting the right words for single-shot tasks, context engineering addresses the broader challenge of curating the entire information state available to an LLM at each inference step -- system prompts, tools, MCP servers, message history, and retrieved data.

The article introduces the concept of an "attention budget": as context length grows, the model's ability to capture pairwise token relationships degrades due to the quadratic nature of transformer attention. This creates a performance gradient rather than a hard cliff. The practical implication is that good context engineering means finding the smallest possible set of high-signal tokens that maximize the desired outcome.

For long-horizon tasks, the team details three complementary techniques: compaction (summarizing conversation history and reinitializing with compressed context), structured note-taking (persisting progress outside the context window, as Claude Code does with to-do lists), and sub-agent architectures (delegating deep exploration to focused child agents that return condensed results). The article also advocates a "just in time" context strategy -- maintaining lightweight references like file paths and URLs rather than pre-loading all data, letting the agent retrieve information on demand through tools.

Key Takeaways

Read Original

Context Engineering for AI Agents: Lessons from Building Manus

Manus

Yichao "Peak" Ji shares hard-won lessons from four complete rebuilds of the Manus agent framework, which the team affectionately calls "Stochastic Graduate Descent." The article is grounded in production metrics: with a 100:1 input-to-output token ratio, the KV-cache hit rate becomes the single most important optimization lever, affecting both latency and cost (cached tokens on Claude Sonnet cost 10x less than uncached ones).

Three architectural principles stand out. First, keep prompt prefixes stable and context append-only -- even a single-token difference (like a timestamp at the start of a system prompt) invalidates the entire cache from that point forward. Second, mask rather than remove tools: instead of dynamically adding/removing tool definitions (which breaks KV-cache and confuses the model), Manus uses a context-aware state machine that constrains action selection via token logit masking during decoding. Tool names are deliberately designed with consistent prefixes (e.g., browser_*, shell_*) to enable group-level constraints.

Third, use the filesystem as unbounded context. When observation data blows past window limits, Manus writes to and reads from files on demand -- treating the filesystem as structured, externalized memory. Context compression is designed to be restorable: a URL or file path is preserved even when the content itself is dropped. The article also explains Manus's todo.md technique -- by rewriting a to-do list at each step, the agent recites its objectives into the tail of context, exploiting recency bias in attention to prevent goal drift. Finally, failed actions are deliberately kept in context rather than cleaned up, because error traces shift the model's posterior away from repeating mistakes.

Key Takeaways

Read Original

Context Engineering for Coding Agents

Thoughtworks

Published on martinfowler.com, this primer from Thoughtworks maps the full landscape of context configuration features available in modern coding agents, using Claude Code as a detailed case study. The author establishes a useful taxonomy: context splits into reusable prompts (instructions that tell the agent what to do, and guidance/rules that set conventions) and context interfaces (tools, MCP servers, and skills that let the agent pull additional context on demand).

A key dimension is who decides to load context: the LLM (non-deterministic, needed for unsupervised operation), the human (controlled but reduces automation), or the agent software itself (deterministic lifecycle triggers like hooks). The article walks through Claude Code's full feature set -- CLAUDE.md, path-scoped rules, slash commands (now deprecated in favor of skills), skills with lazy-loading, subagents with isolated context windows, MCP servers, hooks, and plugins for distribution.

The strongest guidance is on size management: even though context windows are technically large, agent effectiveness degrades with excess context. The recommendation is to build configuration gradually rather than front-loading, and to leverage the agent's built-in compaction. The article closes with an honest warning about the "illusion of control" -- context engineering increases the probability of useful results, but as long as LLMs are involved, outcomes remain probabilistic and human oversight remains essential.

Key Takeaways

Read Original

Advanced Context Engineering for Coding Agents

HumanLayer

HumanLayer's deep-dive argues that AI coding tools fail in production codebases not because models are too dumb, but because practitioners feed them poorly structured context. Drawing on two pivotal talks from AI Engineer 2025 -- Sean Grove's "Specs are the new code" and a Stanford study showing AI tools often cause rework in brownfield codebases -- the article introduces Frequent Intentional Compaction (FIC) as a core workflow.

FIC means designing your entire development process around context management, keeping context utilization in the 40-60% range. The workflow splits into three phases: research (understand the codebase and information flow via subagent exploration), plan (outline precise implementation steps with testing criteria), and implement (step through the plan phase by phase, compacting status back into the plan after each verified phase). The article demonstrates this on a 300k LOC Rust codebase (BAML), where an amateur Rust developer produced a merged PR fixing a real bug, and later shipped 35k LOC of new features in 7 hours.

The most important insight is about human leverage: a bad line of research leads to bad plans, which leads to hundreds of bad lines of code. Therefore, human review should focus on the highest-leverage artifacts -- research documents and plans -- rather than code line-by-line. The article also reframes code review as primarily about mental alignment across the team, not just correctness. Specs and plans serve as readable artifacts that keep everyone oriented even when AI writes most of the code.

Key Takeaways

Read Original

Context-Efficient Backpressure for Coding Agents

HumanLayer

This focused post tackles a specific but widespread waste pattern: agents burning context on verbose tool output that adds no decision-making value. A passing test suite might dump 200+ lines of output, consuming 2-3% of the context window just to convey "all good" -- information expressible in fewer than 10 tokens. The fix is a deterministic backpressure wrapper that swallows output on success and only surfaces it on failure.

The core pattern is a run_silent shell function: run the command, capture output to a temp file, print a single checkmark on success or dump the full output on failure. This means the agent sees ✓ Auth tests instead of 50 lines of passing assertions, but gets full stack traces when something actually breaks. The article recommends layering additional optimizations: enable --bail/-x/--failfast flags to stop at the first failure (don't make the agent context-switch between five bugs), filter generic stack frames, and strip timing information.

The post also identifies an ironic counter-pattern in current models: RL-trained models have become so context-anxious that they pipe output to /dev/null or use head -n 50 on test suites, which can actually waste more tokens (the truncation scaffolding costs more than the output it replaces) and forces re-runs when truncated output hides the actual failure. The solution is to take deterministic control of output so the model doesn't have to guess what to truncate.

Key Takeaways

Read Original

OpenHands Context Condensation for More Efficient AI Agents

OpenHands

OpenHands introduces an intelligent context condenser that maintains bounded conversation memory while preserving the essential information needed to continue work effectively. The problem it solves is familiar: as conversations grow, agents become slower, costlier, and less effective. Starting a new chat sacrifices continuity and forces manual context management.

The condenser works by monitoring conversation size against a threshold. When exceeded, it summarizes older interactions while keeping recent exchanges intact, creating a concise memory of earlier work. The summarization is goal-aware: it encodes the user's objectives, progress made, and remaining work, plus technical details like critical files and failing tests for software engineering tasks. A key design choice is that condensation only triggers at size thresholds rather than every turn, which preserves prompt cache efficiency -- rebuilding costs are amortized across multiple turns.

The results on SWE-bench Verified are compelling: context condensation achieves up to 2x per-turn API cost reduction, consistent response times in long sessions, and equivalent or slightly better task completion (54% vs 53% baseline). The baseline agent's costs scale quadratically over time as context grows, while the condensed approach scales linearly. The only trade-off is occasional extra turns for the condensation step itself. This validates the core insight that aggressive context pruning, when done thoughtfully, does not sacrifice performance.

Key Takeaways

Read Original

Writing a Good CLAUDE.md

HumanLayer

This practical guide addresses the highest-leverage single file in any coding agent workflow: CLAUDE.md (or its open-source equivalent AGENTS.md). Since this file goes into every single conversation, it functions as the onboarding document that tells the agent what the project is, why it exists, and how to work on it -- stack, structure, build commands, test workflows, and verification steps.

The central finding is that less is more. Research indicates frontier thinking LLMs can reliably follow roughly 150-200 instructions, with performance decaying linearly as count increases (exponentially for smaller models). Since Claude Code's own system prompt already consumes about 50 instructions, that leaves limited budget for user instructions. Crucially, Claude Code injects the CLAUDE.md with a note saying "this context may or may not be relevant" -- so overstuffed files get partially ignored by design.

The recommended approach is progressive disclosure: keep the root file concise (HumanLayer's own is under 60 lines) and store task-specific instructions in separate files (agent_docs/building_the_project.md, agent_docs/running_tests.md, etc.) that the agent can load on demand. The article warns against three anti-patterns: using CLAUDE.md as a linter (use deterministic tools instead), auto-generating it with /init (it's too high-leverage for auto-generation), and including non-universal instructions that dilute the signal. Prefer pointers over copies -- reference file:line locations rather than pasting code snippets that go stale.

Key Takeaways

Read Original

Explore the Knowledge Base

Foundations Safety & Guardrails Specs & Workflows Evals & Observability Benchmarks Tools & Runtimes

Context Engineering Knowledge Base Summary

This page curates 7 key resources on context engineering for AI coding agents. Topics covered:

Key principles: treat context as scarce, prefer deterministic control over model-driven truncation, use filesystem and subagents for memory isolation, compact frequently and intentionally, focus human review on highest-leverage artifacts.