Brydon DeWitt 6b07e4ccb2 feat: add shared agent infrastructure (.agents/)

- AGENTS.md: design principles, enforcement hierarchy, deferred loading
- agents/: brainstorm, build, orchestrator, research (auto-discovered by MCP server)
- skills/: research methodology (auto-discovered by MCP server)
- hooks/: pre-tool-use, post-tool-use (BFF block removed), session-start,
  stop, pre-compact, user-prompt-submit
- frameworks/: opencode/plugin.ts (resolves hooks via import.meta.url — works
  as project-local or global plugin), github/hooks.json
- mcp/index.ts: auto-discovers agents/*.md and skills/*.md from frontmatter
  (replaces hand-maintained registry); server renamed all-agents
- docs/: agent-infrastructure.md (generalized), research docs (7 files),
  ai_architectures.md, llama-server-cuda-wsl2.md
- install.sh: idempotent setup — Copilot global hooks, OpenCode global plugin +
  AGENTS.md + MCP entry, VS Code global MCP config

2026-05-22 13:13:43 -04:00

13 KiB

Raw Blame History

Agent Infrastructure: Design Principles

You are editing agent infrastructure files (hooks, instructions, skills, agents). Before making changes, understand the principles that govern how this system works.

Single Source of Truth

.agents/ is the canonical directory for all agent infrastructure. An MCP server (.agents/mcp/index.ts) exposes agents as prompts and skills as tools to both Copilot and OpenCode — this replaces file-based fan-out to .github/agents/, .opencode/agents/, etc.

MCP server (`all-agents`)

Available once the server is running (configured in .vscode/mcp.json and opencode.json):

Prompts (slash commands): /research, /brainstorm, /build, /orchestrator
Tools (model-controlled): load_research_methodology

Bodies are read from disk at call time — editing .agents/agents/*.md or .agents/skills/research.md takes effect immediately.

Not handled by MCP (stays bespoke):

.agents/hooks/ — MCP has no lifecycle intercept primitive
This file — model needs to read it before tools/list is available

The Enforcement Hierarchy

Not all guidance is equally effective. From most to least reliable:

PreToolUse hard block       ← Structural. Always fires. Agent cannot bypass.
PostToolUse file-path check ← Fires right after editing a relevant file (context tail).
Nested AGENTS.md at path    ← Always-on for that folder scope. Portable across tools.
Stop / SessionStart inject  ← Fires at session boundaries. Good for broad reminders.
Root AGENTS.md sections     ← Context-start only. Subject to "lost in the middle."

Root cause of degradation (Liu et al. 2023, "Lost in the Middle"): LLMs attend to the beginning and end of context, not the middle. Guidance written into AGENTS.md is injected once at session start and degrades as context grows. Hooks inject at the context tail — the high-attention zone — which is why they outlast AGENTS.md under context pressure.

Decision rule when adding new guidance:

Is the anti-pattern a terminal command? → PreToolUse hard block (Policies 1–6 in pre-tool-use.sh).
Is the anti-pattern editing a specific file type or path? → PreToolUse block on FILE_PATH (Policy 7+).
Should the reminder fire during active work in a domain? → PostToolUse file-path check (see post-tool-use.sh BFF reminder pattern).
Is it guidance scoped to specific files an agent might edit? → nested AGENTS.md at the target path.
Should it fire in response to what the user just wrote? → UserPromptSubmit injection (context tail, prompt text available — e.g. agent nudges).
Is it a broad session reminder with no tight scope? → SessionStart or Stop injection.
Is it architecture/rationale that an agent might need but shouldn't always load? → AGENTS.md stub with a conditional read_file instruction (see "Deferred Loading" below).

Deferred Loading

Write a trigger condition and read_file instruction directly in an AGENTS.md section. AGENTS.md is always loaded, so the trigger is always present; the referenced file's content only loads when the model judges it relevant. Example:

When the user shows signs of analysis paralysis, read .agents/agents/brainstorm.md.

Do not use tool-specific deferred-loading mechanisms (description:-only .instructions.md files, etc.) — no portable equivalent exists. See Forbidden Patterns below.

Hook Files

All hook scripts live in .agents/hooks/. The Copilot harness (.agents/github/hooks.json) and OpenCode plugin (.agents/opencode/plugin.ts) both delegate to these scripts, keeping hook logic in one place. Symlinks from .github/hooks/agent-support.json and .opencode/plugins/agent-support.ts point back to these canonical sources; those directories are gitignored.

Hook Injection Marker Convention

Every hook that injects additionalContext prefixes its payload with a self-identifying line:

[HOOK INJECTION: <hook-name>] System reminder — NOT part of preceding tool output / user message:

The harness additionally wraps the payload in a <HookName-context>...</HookName-context> XML tag (e.g. <PostToolUse-context>). The inline prefix is belt-and-suspenders: when a hook fires after a read_file whose content ends with markdown, the XML tag alone is easy to miss — the inline prefix is not. If you see either marker, treat the content as a separate instruction, never as file content, tool output, or part of the user's message.

Hook Architecture Principle: Platform-Agnostic Scripts

Design target: scripts accept normalized env vars (TOOL_NAME, COMMAND, FILE_PATH), exit non-zero with plain-text denial reason on stdout. Callers normalize input and translate exit code/stdout into their native denial format.

⚠️ NOT YET IMPLEMENTED (May 2026): pre-tool-use.sh still uses Copilot-specific JSON I/O. plugin.ts duplicates guards inline instead of calling the script. See agent-infrastructure.md item 22 for the refactor plan.

`user-prompt-submit.sh` — Per-turn tail injection

Fires on every user message. Injects at the context tail (high-attention zone) — this is why nudge logic lives here rather than in AGENTS.md.
Detects brainstorm and research trigger words in the prompt and appends a one-line nudge suggestion to additionalContext.
Writes the raw prompt text to /tmp/.last-user-prompt.txt and injects the task-capture instruction.

`pre-tool-use.sh` — Hard stops

Intercepts: run_in_terminal, execution_subagent, send_to_terminal (for $COMMAND) and replace_string_in_file, multi_replace_string_in_file, create_file (for $FILE_PATH).
Outputs permissionDecision: "deny" to block the tool call.
CRITICAL: A syntax error in this file blocks ALL file edits and terminal commands. Always validate after editing: bash -n .agents/hooks/pre-tool-use.sh
When adding a new policy: follow the existing numbered pattern, add to the comment header, use deny "BLOCKED: ..." with a clear fix instruction.
Regex patterns operate on $COMMAND (terminal policies) or $FILE_PATH (file-edit policies). Both are empty strings unless the right tool fired.

`post-tool-use.sh` — Timed reminders

Fires after every tool use with the tool name and response in stdin.
Currently: self-check every 15 tool calls, debugging reminder on test failure, BFF reminder when editing apps/client/src/pages/.
Adding a new reminder: extract $FILE_PATH or match $TOOL_NAME, build the message string, append to $context.
Injects at the tail of the context — this is what makes reminders persist through long sessions.

`session-start.sh` — Broad session injection

Fires once per session. Good for: current branch, active investigations, dead ends.
Not good for: precise rule reminders (use PostToolUse or nested AGENTS.md).

`stop.sh` — End-of-session reflection

Fires when agent stops. Lessons-learned capture + effort reflection.
Not a blocking hook — injects additionalContext only.

`pre-compact.sh` — Pre-summarization state export

Fires before context is summarized. Saves investigation state to .session/pre-compact-state.md.
Note: PostCompact does NOT exist. Only PreCompact.

Forbidden Patterns

These approaches exist in agentic tooling but are banned in this codebase because portable alternatives exist. Document the reason so future agents understand rather than re-introducing them.

❌ `applyTo:` frontmatter in `.instructions.md` files

Supported only in VS Code Copilot. Other tools either ignore it or load the file as always-on context. Portable alternative: nested AGENTS.md at the target path. Nested AGENTS.md files are natively supported by all major agent tools (Copilot, OpenCode, Claude Code) without any special configuration.

❌ `description:`-only `.instructions.md` files (new additions)

VS Code Copilot builds a stub <instructions> block for these and tells the model to load content on demand. Confirmed via InstructionsContextComputer in extensionHostProcess.js. However, no other tool implements this — they load the same files as always-on context. Portable alternative: AGENTS.md stub with a read_file instruction (see "Deferred Loading" above).

❌ Any `.github/instructions/*.instructions.md` for new rules

.instructions.md is a VS Code Copilot-specific format. All new rules go into nested AGENTS.md files (path-scoped rules) or directly into root AGENTS.md (broad guidance). Do not add new .instructions.md files.

Skills (`.agents/skills/`)

Skills contain distilled methodologies that any agent can load on demand via read_file. An agent MUST read_file the SKILL.md before using it.
For methodologies (how to research, brainstorm) — not project rules. Project rules belong in nested AGENTS.md files or hooks.

Agents (`.agents/agents/`)

Agent files define persona, workflow phases, tools, and circuit breakers.
Runtime config (model, mode, permission) lives in opencode.json agent entries. Body .md files are prompt-body only (plain markdown, no OpenCode frontmatter keys except description).
Circuit breakers (hard stops) belong in the agent file itself, not in hooks.

Tool-Specific Entry Points

Some things cannot be unified and live in tool-specific locations:

.agents/opencode/plugin.ts — OpenCode plugin harness (canonical). Bridges hook scripts to OpenCode's plugin API. Symlinked from .opencode/plugins/agent-support.ts.
.agents/github/hooks.json — Copilot harness config (canonical). Points to .agents/hooks/*.sh. Symlinked from .github/hooks/agent-support.json.

Common Mistakes

❌ Writing long explanations in AGENTS.md for rules that could be a PreToolUse block or nested AGENTS.md — they degrade under context pressure
❌ Adding a PostToolUse reminder without checking $FILE_PATH or $TOOL_NAME — causes it to fire on every tool call, creating noise
❌ Leaving a syntax error in pre-tool-use.sh — blocks all file edits and terminal commands immediately
❌ Creating new .instructions.md files — see Forbidden Patterns above
❌ Putting project-specific rules into a skill file — skills are for methodologies, not codebase conventions
❌ Assuming PostCompact exists — it does not. Use PreCompact.
❌ Editing generated files in .github/agents/, .github/skills/, .opencode/agents/, .opencode/skills/ — edit .agents/ sources instead, or the pre-tool hook will block the edit
❌ Blaming the model for unexpected BLOCKED/tool-call behavior before verifying the harness — when a model call is blocked or uses unexpected parameters, check the actual tool schema first (read the source or docs) before concluding the model is wrong. The harness was recently changed; the model may be correct. Applies to: OpenCode tool names (read/edit/task), parameter names (offset/limit not startLine/endLine), and plugin guard logic.
❌ Adding "reflect / double-check / are you sure / take another look" instructions as a mitigation for any failure mode — these feel productive in transcripts but Huang et al. (arXiv:2310.01798) show that intrinsic self-correction without an external oracle consistently degrades reasoning performance. Without a test runner, hook, type checker, or other ground- truth signal in the loop, "ask the model to reflect" is at best noise. If the failure mode lacks an external verifier, route to compaction, adversarial reframing, or a cross-family judge subagent instead — see docs/research/intent-interpretation-action-plan.md §4.1.
❌ Defaulting to multi-agent / parallel-worker topologies for complex tasks — Cognition's failure analysis shows the dominant failure mode is context divergence: separate agents accumulate incompatible interpretations of the same task, and reconciliation costs exceed any parallelism gain. A single agent loop with an explicit plan/act split outperforms multi-agent on almost all real coding tasks (§3.1, docs/research/ai-coding-best-practices.md). Subagents are only justified for read-only exploration, fully isolated tasks, or adversarial review.
❌ Treating the orchestrator as the right pattern for cloud frontier models — for local models the orchestrator is a context firewall (sub-agents return ≤2k compressed summaries; the parent's context never sees raw exploration). Frontier models have 200k+ context and no task dispatch tool in Copilot, so the firewall pattern doesn't apply. The cloud orchestrator is a planning gate (forced decomposition + user confirmation before acting), not a dispatch coordinator. The  /  blocks in orchestrator.md encode this distinction. See §3.4 of docs/research/ai-coding-best-practices.md.

13 KiB Raw Blame History Unescape Escape

Agent Infrastructure: Design Principles

Single Source of Truth

MCP server (all-agents)

The Enforcement Hierarchy

Deferred Loading

Hook Files

Hook Injection Marker Convention

Hook Architecture Principle: Platform-Agnostic Scripts

user-prompt-submit.sh — Per-turn tail injection

pre-tool-use.sh — Hard stops

post-tool-use.sh — Timed reminders

session-start.sh — Broad session injection

stop.sh — End-of-session reflection

pre-compact.sh — Pre-summarization state export