Harness Engineering

Harness engineering is the discipline of improving everything around an AI model that turns it into a useful agent: prompts, tools, context policies, hooks, sandboxes, subagents, feedback loops, memory, observability, and recovery paths. Addy Osmani summarizes the core equation as: Agent = Model + Harness; a raw model becomes an agent only when the harness gives it state, tool execution, feedback loops, and enforceable constraints.^{source: addy-osmani-agent-harness-engineering-2026.md}

The central habit is a ratchet: every observed agent failure should become a durable harness improvement, not just a retry. Examples include adding a convention to AGENTS.md, blocking destructive shell commands with hooks, splitting long tasks into planner/executor roles, or wiring type checks and tests back into the agent loop.^{source: addy-osmani-agent-harness-engineering-2026.md}

Important design patterns:

Behavior-first design: start from the behavior wanted, then add only the harness component that produces it.
Filesystem and Git: durable state, coordination surface, versioning, branching, and rollback.
Bash/code execution: general-purpose tool creation and verification within a ReAct loop.
Sandboxes: safe execution with useful defaults for language runtimes, test CLIs, and browsers.
Memory/search: inject stable project knowledge and retrieve current external facts.
Context management: compaction, tool-call offloading, and progressive disclosure to fight context rot.
Long-horizon execution: loops, planning artifacts, and separate generator/evaluator agents.
Hooks: deterministic enforcement before/after tools, edits, and commits; ideally silent on success and verbose on failure.^{source: addy-osmani-agent-harness-engineering-2026.md}

For this wiki and hermes-agent, harness engineering is directly relevant because skills, AGENTS.md, persistent memory, tool discipline, cron jobs, and linting form the harness that lets the agent maintain a self-improving-knowledge-base. The llm-wiki-pattern itself is an example of turning repeated knowledge-management failures into durable scaffolding.

Mark Erikson gives a concrete software-engineering version of the same thesis: LLM non-determinism becomes useful only when surrounded by deterministic scaffolding such as tests, typechecking, linting, CI, static analysis, explicit plans, prompt/context files, and human review. In this framing, ai-assisted-software-development works best when the harness reduces what the model has to invent and turns repeated knowledge into scripts, tools, and guardrails.^{source: mark-erikson-ai-thoughts-part-1-2026.md}

Peter Yang's personal-agents framing adds a UX constraint: the harness must become powerful enough to do real work while becoming invisible enough that users do not need to understand APIs, MCP servers, CLIs, worktrees, or tool plumbing.^{source: peter-yang-chat-era-ending-2026.md}

Garry Tan adds a compounding-system version of harness engineering: the harness should stay thin, while skills, code, and data become fat. In his framing, skillification is the ratchet that turns repeated workflows into reusable skills, and the gbrain data layer gives those skills enough personal context to behave like an operating system rather than a chatbot.^{source: garry-tan-meta-meta-prompting-ai-agents-2026.md}

Osmani's cognitive-surrender article reframes verification as a cognitive-safety requirement, not just a QA ritual. Evidence-based exits, anti-rationalization tables, smaller PRs, conceptual inquiry before generation, and deliberate friction all preserve the human engineer's independent model while still using agents for speed.^{source: addy-osmani-cognitive-surrender-2026.md}

Output format is part of the harness. Thariq argues that html-artifacts can be a stronger coordination surface than markdown for large agent outputs because they support diagrams, layout, interactivity, annotated diffs, and sharing; this can keep humans in the loop during planning, review, design, and verification.^{source: thariq-unreasonable-effectiveness-html-2026.md}

shopify-river adds an organizational harness pattern: force agent use into public, searchable channels so conversations become training material, reusable context, and social review. In that design, Slack visibility and channel-specific instructions are not incidental UI choices; they are harness components that make the agent and the organization learn together.^{source: tobi-lutke-learning-shop-floor-river-2026.md}

Harness Engineering

Resources