Safety, Guardrails & Safe Autonomy — harn.app Knowledge Base

Beyond Permission Prompts: Making Claude Code More Secure and Autonomous

Anthropic

Anthropic tackles the tension between safety and usability in autonomous coding agents. Permission prompts ("Allow this action?") create friction that defeats the purpose of autonomy. The solution: better sandboxing and policy design that pre-approve safe actions while blocking dangerous ones. Covers container isolation, filesystem restrictions, network policies, and how to design a permission model that reduces prompts by 90% without sacrificing safety.

Key Takeaways

Permission prompts defeat the purpose of autonomous agents
Sandboxing replaces per-action approval with pre-approved safe zones
Container isolation, filesystem restrictions, and network policies form the defense layers
Good policy design reduces prompts by 90% without losing safety

Read Original

Code Execution with MCP: Building More Efficient Agents

Anthropic

How the Model Context Protocol enables controlled code execution. MCP provides explicit, inspectable tool boundaries -- the agent can only execute code through well-defined interfaces, not arbitrary shell access. Covers tool registration, parameter validation, output capture, and how MCP's design makes it possible to audit every action the agent takes. The key insight: explicit tool boundaries are both safer and more capable than unrestricted access.

Key Takeaways

MCP provides explicit, inspectable tool boundaries
Tool registration and parameter validation enforce safety
Every action becomes auditable through MCP
Explicit boundaries are both safer and more capable than unrestricted access

Read Original

Writing Effective Tools for Agents

Anthropic

Practical guidance on designing tool interfaces that agents can use correctly. Covers naming conventions (clear, descriptive), parameter design (constrained types, enums over free text), error handling (actionable error messages), and documentation (inline descriptions that serve as "prompts"). The core principle: tools should be hard to misuse, not just easy to use. A well-designed tool reduces hallucination and incorrect invocations.

Key Takeaways

Tool names and parameters are implicit prompts
Constrained types and enums reduce misuse
Actionable error messages enable self-correction
Tools should be hard to misuse, not just easy to use

Read Original

Mitigating Prompt Injection Attacks in Software Agents

OpenHands

Practical defense strategies against prompt injection in coding agents. Covers confirmation mode (requiring explicit user approval for sensitive operations), content analyzers (detecting injection patterns in file contents and tool outputs), sandboxing (limiting blast radius), and hard policies (absolute rules that can't be overridden by injected content). Particularly relevant for agents that read untrusted code or pull from external repositories.

Key Takeaways

Confirmation mode for sensitive operations
Content analyzers detect injection patterns in file contents and tool outputs
Sandboxing limits blast radius of successful attacks
Hard policies override any injected instructions

Read Original

Assessing Internal Quality While Coding with an Agent

Thoughtworks

Argues for moving quality checks from post-hoc manual review into the agent's working loop. Uses the CCMenu metaphor -- a continuous integration status indicator that tells you whether the build is healthy. For agents, this means running type checkers, linters, and tests continuously during coding, not just at the end. The agent should see quality signals in real-time and self-correct, rather than accumulating technical debt that a human must later fix.

Key Takeaways

Quality checks belong in the loop, not after the fact
Continuous CI-style signals during agent coding
Real-time quality feedback enables self-correction
Prevents accumulation of technical debt

Read Original

Anchoring AI to a Reference Application

Thoughtworks

Concrete exemplars constrain agent output better than abstract instructions. Instead of describing your desired architecture in words, provide a reference implementation that the agent can pattern-match against. This "anchoring" technique reduces variance in agent output and ensures consistency across a codebase. Works especially well for UI consistency, API design patterns, and architectural conventions.

Key Takeaways

Reference implementations beat abstract instructions
Anchoring reduces variance in agent output
Works for UI, API design, and architectural patterns
Show, don't tell -- examples over descriptions

Read Original

Humans and Agents in Software Engineering Loops

Thoughtworks

A mental model for where human oversight adds value and where it doesn't. Argues that humans should focus on strengthening the harness (better constraints, better tests, better evaluation criteria) rather than micromanaging individual agent outputs. The agent handles volume; the human handles taste, judgment, and system design. This division of labor scales better than manual code review of every line.

Key Takeaways

Humans should strengthen the harness, not micromanage outputs
Agent handles volume; human handles taste and judgment
Better constraints, tests, and evals beat more code review
Human-in-the-harness, not human-in-the-loop

Read Original

Claude Code: Best Practices for Agentic Coding

Anthropic

Anthropic's practical recommendations for repo structure, checkpoints, validation, and delegation in agentic coding workflows. Covers CLAUDE.md as the primary instruction surface, git worktrees for isolation, sub-agents for parallelism, hooks for automation, and testing strategies. This is the most hands-on resource for anyone using Claude Code as their primary development tool.

Key Takeaways

CLAUDE.md is the primary instruction surface
Git worktrees provide safe isolation for experiments
Sub-agents prevent context pollution
Hooks automate quality checks and safety
Test-driven workflows catch regressions early

Read Original

Beyond Permission Prompts: Making Claude Code More Secure and Autonomous

Key Takeaways

Code Execution with MCP: Building More Efficient Agents

Key Takeaways

Writing Effective Tools for Agents

Key Takeaways

Mitigating Prompt Injection Attacks in Software Agents

Key Takeaways

Assessing Internal Quality While Coding with an Agent

Key Takeaways

Anchoring AI to a Reference Application

Key Takeaways

Humans and Agents in Software Engineering Loops

Key Takeaways

Claude Code: Best Practices for Agentic Coding

Key Takeaways

Explore the Knowledge Base

Safety, Guardrails & Safe Autonomy Knowledge Base Summary