# Agent Infrastructure Shared agent infrastructure for VS Code Copilot and OpenCode — brainstorm agent, research agent, nudge instructions, hooks, skills, and MCP server. Project-specific overlays live in each project's `.agents/` directory. > **See also:** > [`docs/research/ai-coding-best-practices.md`](../research/ai-coding-best-practices.md) > — research synthesis covering the Prompt/Context/Harness taxonomy, failure > modes, enforcement hierarchy, small-model harness patterns, and all > primary-source citations that underpin the design decisions here. ## Current State ### Architecture Overview The infrastructure is **tool-agnostic**: canonical sources live in `.agents/` and a generator (`npm run generate:agents`) distributes them to `.github/agents/`, `.github/skills/`, `.opencode/agents/`, `.opencode/skills/`. Edit the `.agents/` sources; never edit the generated output directories (they are `.gitignore`d and blocked by pre-tool-use policy). ``` .agents/ ├── AGENTS.md # Root design doc + enforcement hierarchy ├── agents/ # Agent definitions (canonical) │ ├── brainstorm.md │ ├── research.md │ └── build-local.md # OmniCoder 9B via Ollama ├── hooks/ # Shared bash hooks (delegated by all harnesses) │ ├── pre-tool-use.sh # Hard blocks (terminal cmds + file-path policies) │ ├── post-tool-use.sh # Self-check counter + methodology reminders │ ├── session-start.sh # Inject project state at session start │ ├── user-prompt-submit.sh # Per-turn nudge detection + task capture │ ├── pre-compact.sh # Export state before context summarization │ └── stop.sh # Session-end verification └── skills/ └── research/SKILL.md # Research methodology (any agent can load) ``` Generated output (do not edit — regenerated by `npm run generate:agents`): - `.github/agents/` — VS Code Copilot agent files - `.github/skills/` — VS Code Copilot skill files - `.opencode/agents/` — OpenCode agent files - `.opencode/skills/` — OpenCode skill files Harness integration: - **VS Code Copilot**: `.github/agent-support.json` — maps 4 hook events to the shared bash scripts in `.agents/hooks/` - **OpenCode**: `.opencode/plugins/agent-support.ts` — TypeScript plugin that shells out to the same bash scripts ### Brainstorm Agent - 4-phase workflow: Quick Frame → Diverge → Converge → Capture & Hand Off - 6 techniques: Rapid Ideation, SCAMPER, Worst Possible Idea, How Might We, Inversion/Pre-mortem, Constraint Flipping - Counterbalances Opus 4.6 overthinking tendency - Phase 2 includes "push past the obvious" nudge (Zhao et al. 2024: LLMs fall short on originality, excel at elaboration — first ideas are "average") - Phase 4 routes to `@research` for investigation, default agent for implementation - Creates exploration files at `docs/explorations/.md` and session memory notes ### Research Agent - Two orientations that compose recursively: - **Understand** (Grounded Theory): open coding → constant comparison → axial coding → memo → saturation check - **Diagnose** (Strong Inference + Satisficing): 5-factor triage gates between satisficing (low risk) and full falsification (high risk) - 5-factor triage: reversibility, blast radius, confidence, novelty, time cost - Timing awareness: `time` prefix on unknown commands, session/repo memory for baselines, timing feeds into triage decisions - Investigation files at `docs/explorations/.md` - Techniques reference: Five Whys, Delta Debugging, Rubber Duck - Delegates evidence-gathering to Explore subagent, keeps analytical thinking local ### Nudge Instructions - Brainstorm nudge: triggers on hesitation/overthinking language ('wait', 'actually', 'hmm', 'overcomplicating', etc.) - Research nudge: triggers on debugging/investigation language ('why is this broken', 'how does this work', 'root cause', etc.) - Both are non-intrusive single-sentence suggestions, only fire once per topic ### Tool Mapping (Copilot ↔ OpenCode) | Copilot | OpenCode equivalent | | ---------------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `AGENTS.md` (root + nested) | `AGENTS.md` (root, native; nested via `instructions` glob in `opencode.json`) | | `.github/agents/*.agent.md` | `.opencode/agents/*.md` (frontmatter: `description`, `mode`, `model`, `temperature`, `permission`) | | `.github/skills//SKILL.md` | `.opencode/skills//SKILL.md` — also reads `.agents/skills/` and `.claude/skills/` | | `.github/instructions/*.instructions.md` (`applyTo`) | No direct equivalent — fold into AGENTS.md stubs or `instructions` glob | | `.github/hooks/*.sh` (JSON-configured shell) | `.opencode/plugins/*.ts` (TS modules, event-driven) — shells out via Bun's `$` | | `runSubagent` / `Explore` agent | Built-in `general` and `explore` subagents; `@`-mention syntax | | `vscode_askQuestions` | No equivalent — OpenCode uses agent's natural turn-taking | OpenCode plugin event mapping: | Copilot hook | OpenCode event | | -------------- | ----------------------------------- | | `SessionStart` | `session.created` | | `PreToolUse` | `tool.execute.before` | | `PostToolUse` | `tool.execute.after` | | `PreCompact` | `experimental.session.compacting` | | `Stop` | `session.idle` (closest equivalent) | ## Research Foundation > For full research depth, citations, and failure-mode analysis, see > [`docs/research/ai-coding-best-practices.md`](../research/ai-coding-best-practices.md). > The list below records the specific papers and frameworks that shaped the > design decisions in this project. Methodologies and papers that informed the design: - **Grounded Theory** (Glaser & Strauss): build understanding from data, not assumptions. Applied to code-reading in the Understand orientation. - **Strong Inference** (Platt 1964): multiple competing hypotheses → crucial experiments → eliminate. Applied to the Diagnose orientation. - **Satisficing** (Simon 1956): accept "good enough" when optimization cost exceeds benefit. Gates between cheap confirmation and expensive falsification. - **Dual Process Theory** (Kahneman): System 1 (fast, pattern-matching) vs System 2 (slow, analytical). System 1 more accurate in familiar domains. Informs the triage decision. - **Zhao et al. 2024** (arxiv): LLMs fall short on originality, excel at elaboration. First ideas are "average." Informs brainstorm agent's "push past the obvious" nudge. - **"Lost in the Middle"** (Liu et al. 2023): LLMs attend best to beginning/end of context. Informs hook design — inject at context tail for high attention. - **Delta Debugging**: binary search the change space between passing/failing cases. Logic behind `git bisect`. - **Five Whys**: iterative causal chain tracing. Starting point for hypothesis generation, not sole diagnostic method. - **Ronacher "Agent Design Is Still Hard"**: reinforce methodology after every tool call at context tail. Structural injection outperforms relying on instructions in the system prompt. - **Think-Anywhere** (Jiang et al. arXiv:2603.29957, Mar 2026, Peking U + Tongyi Lab): LLMs trained to invoke `` blocks at any token position during code generation, not just upfront. SOTA on LeetCode/LiveCodeBench with fewer total tokens. The motivating insight: a model can plan correctly at the start but introduce an off-by-one bug mid-implementation — only mid-loop reasoning catches it. **Applied here**: the research agent's investigation checklist includes "Re-evaluate hypothesis at every tool-call boundary." For Claude 4 models, interleaved thinking makes this automatic. Complements Plan-and-Solve: upfront decomposition where structure is clear, mid-execution re-evaluation when intermediate results change what to do next. - **Anthropic interleaved thinking** (Claude 4 + adaptive thinking): Claude Sonnet 4.6+ and Opus 4.6+ automatically insert thinking blocks between tool calls. No separate implementation needed — agent instruction design drives it. The research agent's "Re-evaluate at every tool-call boundary" instruction explicitly activates this behavior. - **Prompt/Context/Harness framework** (Alibaba Cloud, Apr 2026): Names the three engineering layers. Prompt = task expression (stateless). Context = what the model sees (AGENTS.md, skills, tools — engineering target is progressive disclosure). Harness = system constraints + verification loops (hooks, permission gates, sub-agent isolation). Diagnostic map: wrong output → Prompt; hallucinated fact → Context; wrong tool selected → Context (fix description); task drift → Harness (sub-agent boundary); destructive action → Harness (permission hook). LangChain improved Terminal Bench 2.0 from 52.8% → 66.5% by changing Harness alone. - **Context engineering** (Rajasekaran et al., Anthropic, Sep 2025): Formally distinguishes context engineering from prompt engineering. Key principles: (a) just-in-time context — agents hold references and load on demand, not upfront; (b) structured note-taking (NOTES.md) as external working memory for long sequential tasks; (c) every new token depletes attention budget — validates the <60-line AGENTS.md ceiling; (d) compaction strategy: maximize recall first, then improve precision. ## MCP Server Lifecycle Hooks — Protocol Status (May 2026) The `.agents/mcp/` server exposes prompts and tools to agents via the MCP protocol. A recurring question: can the MCP server react to session lifecycle events (session start/end, tool-use boundaries)? ### Current protocol state **No lifecycle hooks exist in the MCP protocol.** The spec defines three phases only: `initialize → operation → shutdown`. There is no `session.created`, `post-tool-call`, or `session.ended` notification. This gap is why session awareness currently lives in the OpenCode plugin layer (`.opencode/plugins/agent-support.ts`) rather than the MCP server — OpenCode exposes `session.created`, `session.idle`, `session.compacted`, `session.deleted`, and `tool.execute.before/after` events natively to plugins. ### Active work in the MCP spec **SEP-2624: Interceptors for the Model Context Protocol** ([PR #2624](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2624)) The most organized effort. Supersedes SEP-1763 (closed as completed). Proposes **Interceptors** as a new MCP primitive — two types: **validators** (inspect, return pass/fail) and **mutators** (transform context payloads) — discoverable and invocable via `interceptors/list` and `interceptor/invoke` JSON-RPC methods. These fire at protocol-level operation events: `tools/call`, `prompts/get`, `resources/read`, `sampling/createMessage`, `elicitation/create`. Not session-start/stop hooks, but before/after wrapping for every operation. There is now a formal **Interceptors Working Group** (Bloomberg + Saxo Bank engineers, biweekly cadence). Reference implementations in progress for Go and C# SDKs. Experimental repo: [modelcontextprotocol/experimental-ext-interceptors](https://github.com/modelcontextprotocol/experimental-ext-interceptors). Charter: [modelcontextprotocol.io/community/interceptors/charter](https://modelcontextprotocol.io/community/interceptors/charter). **SEP-2282: Server-Declared Behavioural Hooks** ([PR #2282](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2282)) Smaller, separate open PR. Proposes servers declare **context injections** in `ServerCapabilities` — text injected into the agent's context at client-side lifecycle events (session start, post-tool-use, session end). The contract is "here's context the model should have at this moment," not code execution. More directly analogous to our OpenCode `session.created` / `session.idle` patterns. Currently unsponsored — needs a maintainer to pick it up. ### What to watch - **Primary**: [PR #2624](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2624) + experimental-ext-interceptors repo - **Secondary**: [PR #2282](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2282) (closest to session-lifecycle hooks) - **Label filter**: [`SEP` label](https://github.com/modelcontextprotocol/modelcontextprotocol/issues?q=label%3ASEP) on the modelcontextprotocol repo - **Milestone**: `2026-06-30-RC` is the next spec revision window ### Implication for this project Until interceptors land in a shipping spec version and the TypeScript SDK, the session lifecycle pattern stays at the OpenCode plugin layer. When SEP-2282 or an equivalent lands, the MCP server could self-register context injection hooks during `initialize`, removing the need for tool-specific plugin code. --- ## Model Scale Profiles Different model sizes require different infrastructure strategies. The failure modes are different, so the mitigations are different. ### Large-scale API models (Claude Sonnet / Opus) **Primary failure modes**: overthinking, sycophancy, verbosity, tendency to add unrequested features or comments. **Infrastructure strategy**: - Advisory methodology + structural reinforcement (hooks, circuit breakers) - PostToolUse self-check nudges every ~15 calls - PreToolUse hard blocks for high-risk operations - Subagent delegation for isolated tasks (parent Opus → child Sonnet/Haiku) ### Smaller-scale local models (OmniCoder 9B via Ollama) **Primary failure modes** (different from "low reasoning" — OmniCoder uses Qwen3 thinking blocks natively): - Narrower training distribution (Python/JS heavy) - Quantization degradation: JSON schema compliance drops as context fills - Tool-call history is the primary context consumer — responses must be truncated aggressively - Instruction drift: fewer attention heads (32 vs 64 in 32B) means system prompt recall degrades faster **Infrastructure strategy**: - PostToolUse response truncation at ~1500 tokens (plugin layer, not bash hook) - PreToolUse JSON validation with schema-specific error messages - Context pressure injection at ≥70% fill (~22K/32K tokens) - `steps: 20` cap + `ask` permission gates for natural checkpoints - `explore` subagent delegation to reduce context pressure on the main agent - `NOTES.md` working memory pattern enforced in agent body - No `web` tool — keeps context lean - Reasoning guidance: "Hold references; load on demand" explicit in agent body --- ## OmniCoder 2 Orchestration — Pending Work > Full historical rationale and audit findings were maintained in > `docs/projects/local-ai-orchestration.md` (deleted May 2026 after merge). The > plan used an orchestrator-workers pattern with structural `edit: deny` > enforcement on the orchestrator. All OpenCode config values verified against > opencode.ai/docs (May 2026). ### Goals 1. All agents run on `ollama/arch-omni2-9b` — no cloud fallback 2. User can type vague prompts; the system decomposes and delegates automatically 3. Context windows are isolated per subagent (no shared state bleed) 4. Changes scale forward: switching to cloud means changing model strings, not architecture ### Pending Changes #### Quick wins — under 5 minutes each, no testing required 1. - [x] **[CRITICAL] Fix `` typo in `omnicoder2.modelfile`** — markdown-escape artifact; malformed opening tag paired with correct closing tag. Highest-leverage change; everything below depends on reliable tool-call JSON. 2. - [x] **Mark canonical/deprecated modelfiles** — `# CANONICAL` header on `omnicoder2.modelfile`; `# DEPRECATED` on `omnicoder.modelfile`; `omnicoder-v2.modelfile.template` deleted (was dead code — v2 now served from HuggingFace path). 3. - [x] **Add `compaction.reserved: 3000` to `opencode.json`** — default 10,000 fires compaction too early given ~8–12K baseline context. 4. - [x] **Fix `pre-compact.sh` prettier call** — removes `npx prettier` which violates pre-tool-use Policy 1 (self-violating policy). 5. - [x] **MCP server error handling** — wrap `server.connect(transport)` in try/catch with stderr + `process.exit(1)`. #### Short session — 15–30 minutes each, bounded scope 6. - [x] **Fix `stop.sh` JSON escaping** — replace `sed`-based escaping with `printf '%b' | node JSON.stringify` pattern used in every other hook. 7. - [x] **Per-session PostToolUse counter** — repo-scoped path `/tmp/.opencode-tool-count-` (derived from REPO_ROOT via md5sum); prevents cross-repo contamination; session-start.sh resets it at session begin. 8. - [x] **Shrink compaction prompt to ~120 words** (in `.opencode/plugins/agent-support.ts`) — shorter instructions free bandwidth for the 9B to actually summarize. 9. - [x] **Update `.agents/agents/build-local.md` for v2** — pagination 100 → 50 lines; rule 4 now says "recipient not dispatcher"; rule 7 scope-check says "tell the user, do not self-decompose". #### Depends on orchestrator being proven first 10. - [x] **Trim root `AGENTS.md` to ~60 lines** — reduced from 435 lines to 45 lines; all architecture rationale, code examples, quick task table, and project context removed; cross-cutting rules and quality gate preserved (May 2026). 11. - [x] **PostToolUse weighted counter** — reads (`read_file`, `grep`, `list`) +0.25; writes/shell +1; keeps 15-call SELF-CHECK from firing mid-investigation sweep. Depends on #7 (per-session counter) first. **Implementation** (`.agents/hooks/post-tool-use.sh`): bash has no float arithmetic — scale to integers: reads +1, writes/shell +4, threshold 60 (equivalent to 15 effective write-units). Read-class tools: `read_file`, `grep_search`, `list_dir`, `file_search`, `semantic_search`, `explore_subagent`. Write/shell-class: all `*_string_in_file`, `create_file`, `run_in_terminal`. Replace the single `COUNT=$((COUNT + 1))` with a `case "$TOOL_NAME"` block that does `COUNT=$((COUNT + 1))` for reads and `COUNT=$((COUNT + 4))` for writes/shell. Change the self-check condition from `(( COUNT % 15 == 0 ))` to `(( COUNT % 60 == 0 ))`. 12. - [x] **PostToolUse reminder priority filter** — emit at most 2 reminders per tool call; priority: SELF-CHECK > DEBUGGING > path-scoped > tool-specific. Depends on #11. **Implementation** (`.agents/hooks/post-tool-use.sh`): replace the current single `context` string accumulator with an indexed array `reminders=()`. Each block appends `reminders+=("$msg")` in priority order (SELF-CHECK first, DEBUGGING second, BFF/QUALITY GATE third, RENAME fourth). At output time: join only the first 2 elements. Append with `\n\n` separator. Blocks that didn't fire don't append, so the cap is natural. 13. - [x] **Broaden PostToolUse truncation to all `ollama/` agents** (`.opencode/plugins/agent-support.ts`); differentiate limit: orchestrator 2,500 tokens vs workers 1,500. Minor until orchestrator exists. **Implementation**: rename `BUILD_LOCAL_MAX_RESPONSE_TOKENS` → `LOCAL_WORKER_MAX_TOKENS = 1500`; add `LOCAL_ORCHESTRATOR_MAX_TOKENS = 2500`. In `tool.execute.after`, the existing `isLocalAgent` check covers all `ollama/` agents via `input.model.startsWith('ollama/')`. Add a second check: `input.agent === 'local-orchestrator'` → use orchestrator limit, else worker limit. The `agent` field is available in `tool.execute.after` (confirmed working for `build-local`). 14. - [x] **Create `.agents/agents/local-orchestrator.md`** — primary agent with `edit: deny`, `write: deny`, `bash: deny`; whitelist `task` to `build-local`, `research`, `brainstorm` only. **Implementation**: new file modeled on `build-local.md`. Role: receive high-level goal, decompose into bounded subtasks, show decomposition to user before dispatching, delegate via `task` subagent. Permission block in `opencode.json` `agent.local-orchestrator`: `{ "edit": "deny", "write": "deny", "bash": "deny" }`. Agent body rules: (1) read project root `AGENTS.md` first; (2) produce a task list and confirm with user before dispatching; (3) one `task` call per subtask, wait for result; (4) never attempt to edit files directly — if a subtask requires context the worker needs, inject it via the `task` prompt, not by reading files yourself; (5) after all subtasks, report summary to user. 15. - [x] ~~**Set `default_agent: "local-orchestrator"` in `opencode.json`**~~ — Done May 2026. Key is `default_agent` (snake_case, confirmed from `opencode.ai/config.json` schema). `local-orchestrator` has `mode: all` so it qualifies as a primary agent. #### Done - [x] ~~**Soften `opus-deep.modelfile` directive**~~ — file deleted (May 2026); DeepSeek R1 available online when needed; OmniCoder 2 is the sole local model. ### Known Tradeoffs | Tradeoff | Impact | Mitigation | | -------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | Instructions glob trimmed to root `AGENTS.md` only | Agents miss project-specific patterns for subdirectories unless they read nested `AGENTS.md` explicitly | Add reminder in orchestrator + build-local agent body: "check nested `AGENTS.md` before working in subdirectories" | | Same model for all roles | Orchestrator, worker, compaction agent are all same weights with different prompts | Structural `edit: deny` is the safety net; circuit breakers limit runaway loops | | No cloud fallback | If task is too complex for 9B, no escalation path | Orchestrator includes "ask the user for direction" rule; user can switch to Copilot | | Latency | Sequential dispatch: orchestrator decomposes → build-local runs → returns. ~2× wall time vs. direct build-local | Acceptable for local dev; no VRAM multiplier since Ollama keeps weights hot | | Reminder-stacking cap | 2-per-call priority filter (pending work above) drops lower-priority warnings | Skipped reminders fire on next call if condition holds | ### Cloud Migration Path When ready to add a cloud model, only `opencode.json` changes: ```json { "model": "ollama/arch-omni2-9b", "agent": { "local-orchestrator": { "model": "anthropic/claude-haiku-4-5" } } } ``` Schema verified against opencode.ai/docs/agents/ (May 2026). The `tools` key inside agent configs is deprecated in favour of `permission` — the orchestrator definition uses `permission`, so it is current. The `agent.{name}.model` key is the correct per-agent override mechanism. --- ## Ecosystem Gap — Contextual AGENTS.md Injection During local AI work (May 2026) we hit a fundamental limitation: OpenCode's `instructions` glob in `opencode.json` loads **all matched files upfront** into every session. For a 9B local model with a 32K context window, loading all of `apps/*/AGENTS.md` and `packages/*/AGENTS.md` at startup consumes ~30–40% of the context budget before the first message, triggering early compaction and degrading quality. The correct behaviour — injecting only the AGENTS.md relevant to the file being edited — does not exist natively in OpenCode or its plugin ecosystem. The closest community plugin (`opencode-skillful`, 295 stars) is archived as of Feb 2026 and still requires the model to explicitly call `skill_find`/`skill_use`; it provides no path-triggered structural injection. ### Open tasks 16. - [ ] **Assess: is filling this ecosystem gap worth the effort?** — Before building a contextual-injection plugin, evaluate: (a) Is OpenCode actively used for serious local AI coding work, or is the community primarily cloud-model users for whom context cost is irrelevant? (b) Are there better local AI coding stacks (e.g. Aider + litellm, Cursor local mode, VS Code Copilot + Ollama) where this problem is already solved? (c) Is the `tool.execute.before` event stable enough to build on? Target: 30-minute research session, concrete go/no-go recommendation. 17. - [ ] **Review + write up our issues and fixes as an ecosystem contribution** — If the gap is worth filling: document the context-bleed problem, the early-compaction root cause, our hook-based mitigation, and the remaining structural gap. Publish as a GitHub issue on the OpenCode repo and/or an npm plugin (`opencode-contextual-rules`?) implementing `tool.execute.before` path-triggered AGENTS.md injection. Depends on #16 go/no-go. 18. - [x] ~~**Trim `.agents/AGENTS.md`**~~ — Done May 2026. Condensed from 12,584 → 10,507 bytes (43 lines removed). Trimmed: Hook Architecture Principle block (redirected to item 22 in project doc), Deferred Loading example + "why not" paragraph, session-start/stop hook prose, outdated `generate-agents.ts` references in Skills/Agents sections. Agent body files updated to prompt-body-only convention (see items 25/26). 19. - [x] ~~**Block bash bypass of read pagination**~~ — Done May 2026. Added Policy 14 to `pre-tool-use.sh`: blocks `cat`/`head`/`tail`/`jq` reads of `apps/*/package.json` and `packages/*/package.json`. Scope limited to package.json (confirmed live bypass vector); general `.ts`/`.md` bash reads are not yet blocked (lower-urgency gap). Pattern verified with Node.js unit test — exact bypass command `cat apps/api/package.json | jq` is caught by P1. 20. - [ ] **Improve explore-first scope detection** — Policy 14 blocks `manage_todo_list` with ≥4 items, but OmniCoder sometimes starts with `Explore`/`find` before planning, bypassing the check. Options: (a) block `explore_subagent` when the query looks like a multi-file discovery sweep (glob patterns for source files across multiple dirs); (b) add a pre-tool-use check on `run_in_terminal` that denies `find` commands spanning the whole repo when the task hasn't been scoped yet; (c) rely on the todo-list check firing when planning eventually happens (current behavior — catches it late but still before edits start). 21. - [x] ~~**Remove debug logging from plugin after verified cycle**~~ — Done May 2026. Removed the full-input dump block from `tool.execute.before` in `plugin.ts` (`/tmp/plugin-debug.jsonl` appender). Guards verified via `opencode export` session transcript inspection — no longer need the dump file. Hook error logger (`/tmp/plugin-hook-errors.log`) kept as it only fires on failures, not every call. 22. - [ ] **Refactor hook scripts to be platform-agnostic** — currently `pre-tool-use.sh` parses Copilot-specific JSON and outputs Copilot-specific `permissionDecision` JSON. `plugin.ts` implements duplicate guards inline rather than calling the script. This means OpenCode and Copilot guards can drift (confirmed May 2026: Policy 14 in `pre-tool-use.sh` had no effect on OpenCode `bash` tool calls). **Design target**: scripts accept normalized env vars (`TOOL_NAME`, `COMMAND`, `FILE_PATH`), exit non-zero with plain-text denial reason on stdout. Callers normalize input and translate output to their native denial format. Tracked in `.agents/AGENTS.md` Hook Architecture Principle section. **Audit required first**: review all hook scripts for Copilot-specific assumptions before refactoring. 23. - [ ] **Question-drift marker in `user-prompt-submit.sh`** — when the model has committed to a prior position and follow-up questions are being misread through that lens, prepend a disambiguation marker at the prompt tail. Detected pattern: model answers "no" or "not possible" in a prior turn → subsequent turns interpreted as defense of that position. See §2.1 ("Position-anchored priming") in the research doc. **Implementation**: in `user-prompt-submit.sh`, read the last N turns of `$TRANSCRIPT_PATH` (injected by OpenCode's native hook env) and look for a prior committed "no/impossible/can't" response within the last 3 model turns. If detected, append to `ADDITIONAL_CONTEXT`: `CURRENT QUESTION (answer only this — not the prior exchange): [prompt text]`. The key is repeating the user's exact question at the tail, after the marker, to counteract lost-in-the-middle effects. Fallback trigger: user prompt contains "that's not what I asked" / "you're answering the wrong question" / "I said" → always inject marker regardless of transcript scan. 24. - [x] ~~**Review all custom agent files for local-model-specific framing**~~ — Done May 2026. `build-local.md` reframed: dropped "OmniCoder", "9B", "Ollama", "Qwen3 thinking blocks", "32K tokens total"; replaced with model-agnostic equivalents. `research.md` and `brainstorm.md` verified clean — no model/provider mentions. `local-orchestrator.md` was fixed earlier this session. All four agent body files are now model-agnostic. 25. - [ ] **Failure-mode routing in SELF-CHECK** — when the periodic SELF-CHECK fires in `post-tool-use.sh`, if a recent terminal failure or test failure is also present in the same turn, classify the failure type and inject the matched intervention rather than generic "step back." Reference: failure-mode routing table in §3.5 of the research doc. **Implementation**: in the SELF-CHECK block, if `context` already contains `DEBUGGING REMINDER` (i.e., test/terminal failure co-occurred this turn), append a classification hint: `FAILURE TYPE HINT: If this is a test/build failure → Reflexion loop (fix based on test output). If convention violation → grep for the pattern and inject a canonical example. If wrong file/directory → stop and re-read the project structure. Do not default to "try harder."`. Low implementation cost — pure text append with a conditional on `$context`. 26. - [x] ~~**Audit agent `.md` files for OpenCode-specific frontmatter**~~ — Done May 2026. Audit result: only `local-orchestrator.md` had OpenCode frontmatter keys (`mode`, `model`, `permission`). `brainstorm.md`, `build-local.md`, `research.md` were already plain markdown. Went with option (b): stripped `mode`/`model`/`permission` from `local-orchestrator.md`; moved `mode: all` into `opencode.json` (model + permission were already there). Kept `description` in frontmatter as it is neutral and self-documenting. Body files are now prompt-body only — valid in both OpenCode and Copilot. 27. - [ ] **`plugin.ts` local-agent detection uses provider prefix, not agent name** — `tool.execute.after` detects local agents via `input.model.startsWith('ollama/')`. This is provider-specific: if the model is served via a different backend (e.g. `llama-server/`, `lmstudio/`), truncation silently stops working. Fix: detect by agent name (`input.agent.includes('build-local')`) only, removing the `ollama/` fallback. The `input.agent` field is available in `tool.execute.after` (confirmed May 2026). 28. - [ ] **`plugin.ts` context pressure threshold is hardcoded to 32,768 tokens** — `CONTEXT_LIMIT_TOKENS = 32768` assumes OmniCoder 9B's context window. If the local model changes, the threshold silently drifts out of calibration. Options: (a) read from `opencode.json` model config if OpenCode exposes it to plugins; (b) make it a top-of-file constant with a comment to update when changing models; (c) accept the drift as low-severity (threshold is advisory only — context pressure warnings are informational, not blocking). Option (b) is the minimum; option (a) is ideal if OpenCode exposes model metadata to plugins. 29. - [x] ~~**Move `permission` out of `local-orchestrator.md` frontmatter**~~ — Done May 2026 as part of item 25. `mode: all` added to `opencode.json` agent entry. `model` and `permission` were already in `opencode.json`. `opencode.json` is now the single source of truth for all runtime config; `.md` files are prompt-body only. --- ## Testing & Regression **Research summary (May 2026):** No pre-existing tool exactly fits this use case. Existing tools (RagaAI Catalyst, AgentEvalKit, agent-eval-arena, intent-eval-lab, j-rig-skill-binary-eval) focus on LLM output quality, hallucination detection, or cross-runtime behavior scoring — not config file structure or policy enforcement regression. The closest analogue is `j-rig-skill-binary-eval` (binary pass/fail criteria across 7 layers), which uses the same conceptual approach we'd want here. Our testing is bespoke by necessity: we're testing configuration files, shell scripts, and specific policy enforcement behaviors, not general LLM response quality. **Two layers of testing:** | Layer | What it tests | Cost | When to run | | --------------------------- | --------------------------------------- | ---------------- | -------------------------------------- | | Config + policy unit tests | Schema validity, hook regex correctness | None (no model) | Always — CI, pre-commit | | CLI integration smoke tests | Actual enforcement via `opencode run` | Local model only | On-demand; local model must be running | **Cloud agents excluded from integration tests** — `opencode run` with a cloud model (Copilot, Anthropic) incurs API costs and rate limits. Tests must detect the active model and skip if it's not a local provider. ### Open tasks 30. - [ ] **Config + policy unit test suite** — test config file structure and hook regex patterns without invoking any model. Implementation: a. **`opencode.json` schema validation**: the file references `"$schema": "https://opencode.ai/config.json"` — validate it using `ajv` (already used in the monorepo) against the live schema or a cached copy. Catches permission typos, unknown agent keys, unsupported field values. b. **Hook JSON structure validation**: validate `.agents/frameworks/github/hooks.json` and `.agents/frameworks/opencode/plugin.ts` (TypeScript, already type- checked). Write a schema for the hooks JSON format and run ajv on it. c. **Hook policy regex unit tests**: extract every regex used in `pre-tool-use.sh` into a `tests/hooks.test.ts` file and run it with `vitest`. For each policy, define 2–3 input strings that SHOULD match and 2–3 that SHOULD NOT. Policy 14 already has an informal Node.js test from this session — formalize it. d. **Agent `.md` frontmatter validator**: check that no agent file under `.agents/agents/` has frontmatter keys other than `description`. Catches regression when someone adds `model:` or `permission:` back to a body file. **Suggested location**: `.agents/tests/` or root `test/agents/`. **Stack**: vitest (already in monorepo), ajv (already available), Node built-ins. No new dependencies needed. 31. - [ ] **CLI integration smoke tests (local model only)** — use `opencode run` in non-interactive mode to verify enforcement is actually firing via the real runtime. These tests exercise the plugin + hook wiring end-to-end. **Command shape**: ``` opencode run "prompt" --agent build-local \ --model llama-server/arch-omni2-9b-native \ --format json ``` **Assertions via `opencode export`**: after each run, export the session with `opencode export 2>/dev/null` and parse the JSON transcript. Assert on `parts` array: tool calls that SHOULD have been blocked appear with error/denied status; tool calls that SHOULD have passed completed normally. **Test cases to start with** (all verified real enforcement gaps): 1. Attempt to `read` a nested `package.json` (e.g. `apps/api/package.json`) → BLOCKED by plugin package.json guard 2. Attempt to `read` a source file with no `limit` → BLOCKED by pagination guard 3. Attempt to `read` a source file with `limit: 51` → BLOCKED 4. Attempt to `read` a docs file with `limit: 501` → BLOCKED 5. Attempt to `read` a docs file with `limit: 50` → PASSES 6. Bash command `cat apps/api/package.json` → BLOCKED by pre-tool-use Policy 14 (substitute your project's equivalent nested package.json) **Guard rail**: skip all tests if `llama-server` is not reachable at `http://127.0.0.1:8080/v1`. Do not run against cloud models. Add an env var `AGENT_INTEGRATION_TESTS=1` required to enable (off by default, never runs in standard `npm test`). **Suggested location**: `.agents/tests/integration/`. **Stack**: Node.js test runner or vitest, `opencode` CLI in PATH. ### Verified facts (May 2026) - OpenCode's `read` tool input schema is `{ filePath: string, limit?: number, offset?: number }` — NOT `startLine`/`endLine`. Confirmed via plugin debug logging of real tool calls. - `tool.execute.before` input contains only `{ tool, sessionID, callID }`. It does NOT include `agent` or `model`, so plugin-layer gating cannot filter by agent. Confirmed via plugin debug logging. - **OpenCode has its own native hook system** that calls `pre-tool-use.sh` directly for tools like `run_in_terminal`, `replace_string_in_file`, etc. This is completely separate from the plugin's `runHook` calls. The native hook payload includes `timestamp`, `hook_event_name`, `session_id`, `transcript_path`, `tool_use_id`, and `cwd` — fields the plugin never sends. The plugin `runHook` is a _second_ call, layered on top. - **Bun shell `$` API does not have a `.stdin()` method.** The correct API for piping stdin is `` $`cmd < ${Buffer.from(text)}` ``. `.stdin(text)` silently throws `TypeError: $\`...\`.stdin is not a function`, which was caught by `runHook`'s `catch`block and returned`''`. This caused the plugin's `runHook`to silently no-op for every call with`stdinJson`since the plugin was first written — hook enforcement (all 12 policies) was never running via the plugin path. It only ran via OpenCode's native hook system for the tools OpenCode natively supports. Confirmed via`/tmp/plugin-hook-errors.log`. - **The silent `catch` in `runHook` is dangerous.** It masked the Bun `.stdin()` bug entirely. Always log hook failures to a debug file during development; remove only after enforcement is verified working. - **Plugin-layer enforcement works for `read`** after fixing the Bun stdin API. The `read` tool fires `tool.execute.before` in the plugin, which calls `runHook('pre-tool-use.sh', ...)` via `< ${Buffer.from(...)}`, which applies Policy 13 (50-line limit). Verified: bare `read` (no limit) → BLOCKED; `read` with `limit: 50` → passes. (May 2026) - **Plugin load failure: unescaped regex slashes caused silent syntax error.** `plugin-debug.jsonl` was empty even after the Bun stdin fix because the plugin file itself failed to parse. Line 84 had `/(^|/)(apps|packages)/[^/]+/...` — forward slashes inside the regex literal were not escaped, producing a JS syntax error at parse time. Bun silently drops plugins that fail to import. Fixed to `/(^|\/)(apps|packages)\/[^/]+\/...`. The fix also corrected the pagination guard to use `limit`/`offset` (not `startLine`/`endLine`) and added an unbounded-read block (`limit === undefined`). All three guards verified working in a live session (May 2026). - **Package.json read guard verified working.** `local-orchestrator` attempting to read `apps/*/package.json` and `packages/*/package.json` → BLOCKED by plugin. Root `package.json` read correctly passes. (May 2026) - **Policy 14 (`manage_todo_list` ≥ 4 items) catches some but not all broad task attempts.** OmniCoder sometimes proceeds directly to `Explore`/`find` without calling `manage_todo_list` first, bypassing the policy. When it does plan with the todo tool before acting, the deny fires correctly. - **OmniCoder comprehension failure: prompt ambiguity → wrong directory.** Given "refactor the five hook files", OmniCoder ran a glob for `*hook*` files and found `.husky/` hooks instead of `.agents/hooks/`. The correct files were in the grep output from the Explore subagent but were not selected. Root cause: the model lacks enough context about the repo layout to disambiguate "hook files" without explicit path guidance. Mitigation: be explicit in prompts ("the five `.agents/hooks/*.sh` files"). - **OpenCode agent `permission` config requires a `.opencode/agents/.md` file.** Without a matching markdown file, `opencode.json`'s `agent..permission` config is silently ignored — the agent is unknown to OpenCode and runs as a nameless build-agent alias. The markdown file must exist in `.opencode/agents/` (or `~/.config/opencode/agents/`). Confirmed by test run where `@local-orchestrator` edited files despite `permission.edit: "deny"` in JSON config; fixed by creating `.opencode/agents/local-orchestrator.md` symlink. (May 2026) - **`"write"` is NOT a valid OpenCode permission key.** Use `"edit"` instead — it covers `write`, `edit`, and `apply_patch` tools. `"write": "deny"` is silently ignored. Valid top-level permission keys include: `read`, `edit`, `glob`, `grep`, `list`, `bash`, `task`, `skill`, `lsp`, `question`, `webfetch`, `websearch`, `external_directory`, `doom_loop`, `todowrite`. Confirmed from `opencode.ai/docs/permissions` (May 2026). - **`default_agent` key is snake_case** in `opencode.json` (not `defaultAgent`). Confirmed from `opencode.ai/docs/config` (May 2026). - **`tools: false` is deprecated.** The current approach for per-agent tool restriction is `permission: { edit: "deny" }`. The old `tools: false` still works but is documented as legacy. Confirmed from `opencode.ai/docs/agents` (May 2026). - **Broken symlinks are silent.** OpenCode does not error on a broken `.opencode/agents/` symlink — it just skips the agent silently. The agent won't appear in `opencode agent list` and all `opencode.json` permission config for it is ignored. Always verify with `cat .opencode/agents/.md | head -5` (should print content, not a "No such file" error) and `opencode agent list` (agent should appear with correct deny rules). The correct symlink depth from `.opencode/agents/` is `../../.agents/agents/.md` (two levels), not three. - **`opencode agent list` is the authoritative verification command.** Run it after any agent config change to confirm: (a) the agent appears by name, (b) its mode is correct (`all`/`primary`/`subagent`), and (c) `deny` rules appear at the bottom of its permission list. Missing agent = broken symlink or YAML parse error. Present but missing deny rules = frontmatter not parsed correctly or wrong key names. (May 2026) - **`@mention` routing only works at session start.** If you send any message that gets answered by the current primary agent first, then send `@local-orchestrator ...`, the TUI passes the full message text to the current model (Build/OmniCoder) which treats `@local-orchestrator` as freeform text and answers it itself. Always open a **fresh session** and make `@agent-name` the very first message. Alternatively, use `opencode run --agent local-orchestrator "..."` from the CLI for reliable agent-scoped invocation. **Tab-switching to a custom `all`-mode agent in an existing session works correctly.** - **`edit: deny` on `local-orchestrator` is working correctly.** When given an edit task, the orchestrator correctly avoided using `replace_string_in_file` and instead used the `task` tool to delegate to a subagent. This is the expected behaviour. Confirmed May 2026. - **`task` tool has a JSON serialization limit.** OmniCoder 9B caused an `Unterminated string` error by embedding the entire contents of multiple `package.json` files as a literal string inside the `task` prompt JSON. The `task` tool prompt is serialized as JSON; very long strings truncate and produce parse errors. Mitigation: instruct the orchestrator in its system prompt to tell workers _which files to read_ rather than quoting file contents inline. This has been added to `local-orchestrator.md`. (May 2026) - **`ollama/arch-omni2-9b` is the wrong model identifier for the llama-server instance.** The correct ID is `llama-server/arch-omni2-9b-native` (verify with `opencode models | grep arch`). Using the wrong ID causes an immediate "cannot load model" error when the agent is invoked. Fixed in `opencode.json` and `local-orchestrator.md` frontmatter. (May 2026) ## Open Issues Known bugs and stale claims identified during code review (see deleted `agent-infrastructure-review.md` and `agent-infrastructure-review-pass2.md` for full context). Not yet fixed. ### CRITICAL — `description:` empty in all generated agent/skill files `scripts/generate-agents.ts` uses a hand-rolled YAML parser that silently drops descriptions when they are written in block-scalar form (value on the next line under the key). Every generated file in `.github/agents/`, `.github/skills/`, `.opencode/agents/`, `.opencode/skills/` has a blank `description:` field. `description:` is the primary routing signal for Copilot's `SkillsContextComputer` and OpenCode's agent dispatch. Explicitly `@`-mentioning an agent by name still works; description-triggered auto-routing does not. **Fix**: Inline the description strings in the canonical `.agents/` source files (change block-scalar to `key: 'value'` format). The existing parser handles inline strings correctly. Add a `generate:agents:check` assertion that every generated file has a non-empty `description:`. ### MEDIUM — ~~`printf '%s'` regression in hooks breaks `\n` rendering~~ (resolved) ~~`.agents/hooks/post-tool-use.sh`, `session-start.sh`, and `user-prompt-submit.sh` use `printf '%s' "$context" | node -e '...'` to JSON-escape the context variable. `%s` does not interpret `\n` escape sequences, so multi-line context strings (SELF-CHECK, DEBUGGING REMINDER, BFF REMINDER) arrive at the model as single lines with literal `\n` characters.~~ **Verified fixed** (May 2026): all three hooks already use `printf '%b'`. ### LOW — ~~arXiv citation `2603.29957` unverified~~ (resolved) ~~`arXiv:2603.29957` (Jiang et al. 2026, "Think-Anywhere") appears in `.agents/agents/research.md`, `.agents/agents/brainstorm.md`, and the Research Foundation section above. Verify the ID resolves at `https://arxiv.org/abs/2603.29957` and fix all references if it doesn't.~~ **Verified real** (May 2026): "Think Anywhere in Code Generation" by Xue Jiang, Tianyu Zhang, Ge Li et al., submitted March 31, 2026, revised April 27, 2026 (v3), cs.SE. All existing citations are correct. ### LOW — ~~`.claude/` false claims in `tool-agnostic-agent-infra.md`~~ (resolved) The file `docs/projects/tool-agnostic-agent-infra.md` no longer exists — already deleted. No action needed.