- AGENTS.md: design principles, enforcement hierarchy, deferred loading - agents/: brainstorm, build, orchestrator, research (auto-discovered by MCP server) - skills/: research methodology (auto-discovered by MCP server) - hooks/: pre-tool-use, post-tool-use (BFF block removed), session-start, stop, pre-compact, user-prompt-submit - frameworks/: opencode/plugin.ts (resolves hooks via import.meta.url — works as project-local or global plugin), github/hooks.json - mcp/index.ts: auto-discovers agents/*.md and skills/*.md from frontmatter (replaces hand-maintained registry); server renamed all-agents - docs/: agent-infrastructure.md (generalized), research docs (7 files), ai_architectures.md, llama-server-cuda-wsl2.md - install.sh: idempotent setup — Copilot global hooks, OpenCode global plugin + AGENTS.md + MCP entry, VS Code global MCP config
49 KiB
Agent Infrastructure
Shared agent infrastructure for VS Code Copilot and OpenCode — brainstorm
agent, research agent, nudge instructions, hooks, skills, and MCP server.
Project-specific overlays live in each project's .agents/ directory.
See also:
docs/research/ai-coding-best-practices.md— research synthesis covering the Prompt/Context/Harness taxonomy, failure modes, enforcement hierarchy, small-model harness patterns, and all primary-source citations that underpin the design decisions here.
Current State
Architecture Overview
The infrastructure is tool-agnostic: canonical sources live in .agents/
and a generator (npm run generate:agents) distributes them to
.github/agents/, .github/skills/, .opencode/agents/, .opencode/skills/.
Edit the .agents/ sources; never edit the generated output directories (they
are .gitignored and blocked by pre-tool-use policy).
.agents/
├── AGENTS.md # Root design doc + enforcement hierarchy
├── agents/ # Agent definitions (canonical)
│ ├── brainstorm.md
│ ├── research.md
│ └── build-local.md # OmniCoder 9B via Ollama
├── hooks/ # Shared bash hooks (delegated by all harnesses)
│ ├── pre-tool-use.sh # Hard blocks (terminal cmds + file-path policies)
│ ├── post-tool-use.sh # Self-check counter + methodology reminders
│ ├── session-start.sh # Inject project state at session start
│ ├── user-prompt-submit.sh # Per-turn nudge detection + task capture
│ ├── pre-compact.sh # Export state before context summarization
│ └── stop.sh # Session-end verification
└── skills/
└── research/SKILL.md # Research methodology (any agent can load)
Generated output (do not edit — regenerated by npm run generate:agents):
.github/agents/— VS Code Copilot agent files.github/skills/— VS Code Copilot skill files.opencode/agents/— OpenCode agent files.opencode/skills/— OpenCode skill files
Harness integration:
- VS Code Copilot:
.github/agent-support.json— maps 4 hook events to the shared bash scripts in.agents/hooks/ - OpenCode:
.opencode/plugins/agent-support.ts— TypeScript plugin that shells out to the same bash scripts
Brainstorm Agent
- 4-phase workflow: Quick Frame → Diverge → Converge → Capture & Hand Off
- 6 techniques: Rapid Ideation, SCAMPER, Worst Possible Idea, How Might We, Inversion/Pre-mortem, Constraint Flipping
- Counterbalances Opus 4.6 overthinking tendency
- Phase 2 includes "push past the obvious" nudge (Zhao et al. 2024: LLMs fall short on originality, excel at elaboration — first ideas are "average")
- Phase 4 routes to
@researchfor investigation, default agent for implementation - Creates exploration files at
docs/explorations/<name>.mdand session memory notes
Research Agent
- Two orientations that compose recursively:
- Understand (Grounded Theory): open coding → constant comparison → axial coding → memo → saturation check
- Diagnose (Strong Inference + Satisficing): 5-factor triage gates between satisficing (low risk) and full falsification (high risk)
- 5-factor triage: reversibility, blast radius, confidence, novelty, time cost
- Timing awareness:
timeprefix on unknown commands, session/repo memory for baselines, timing feeds into triage decisions - Investigation files at
docs/explorations/<name>.md - Techniques reference: Five Whys, Delta Debugging, Rubber Duck
- Delegates evidence-gathering to Explore subagent, keeps analytical thinking local
Nudge Instructions
- Brainstorm nudge: triggers on hesitation/overthinking language ('wait', 'actually', 'hmm', 'overcomplicating', etc.)
- Research nudge: triggers on debugging/investigation language ('why is this broken', 'how does this work', 'root cause', etc.)
- Both are non-intrusive single-sentence suggestions, only fire once per topic
Tool Mapping (Copilot ↔ OpenCode)
| Copilot | OpenCode equivalent |
|---|---|
AGENTS.md (root + nested) |
AGENTS.md (root, native; nested via instructions glob in opencode.json) |
.github/agents/*.agent.md |
.opencode/agents/*.md (frontmatter: description, mode, model, temperature, permission) |
.github/skills/<name>/SKILL.md |
.opencode/skills/<n>/SKILL.md — also reads .agents/skills/ and .claude/skills/ |
.github/instructions/*.instructions.md (applyTo) |
No direct equivalent — fold into AGENTS.md stubs or instructions glob |
.github/hooks/*.sh (JSON-configured shell) |
.opencode/plugins/*.ts (TS modules, event-driven) — shells out via Bun's $ |
runSubagent / Explore agent |
Built-in general and explore subagents; @-mention syntax |
vscode_askQuestions |
No equivalent — OpenCode uses agent's natural turn-taking |
OpenCode plugin event mapping:
| Copilot hook | OpenCode event |
|---|---|
SessionStart |
session.created |
PreToolUse |
tool.execute.before |
PostToolUse |
tool.execute.after |
PreCompact |
experimental.session.compacting |
Stop |
session.idle (closest equivalent) |
Research Foundation
For full research depth, citations, and failure-mode analysis, see
docs/research/ai-coding-best-practices.md. The list below records the specific papers and frameworks that shaped the design decisions in this project.
Methodologies and papers that informed the design:
- Grounded Theory (Glaser & Strauss): build understanding from data, not assumptions. Applied to code-reading in the Understand orientation.
- Strong Inference (Platt 1964): multiple competing hypotheses → crucial experiments → eliminate. Applied to the Diagnose orientation.
- Satisficing (Simon 1956): accept "good enough" when optimization cost exceeds benefit. Gates between cheap confirmation and expensive falsification.
- Dual Process Theory (Kahneman): System 1 (fast, pattern-matching) vs System 2 (slow, analytical). System 1 more accurate in familiar domains. Informs the triage decision.
- Zhao et al. 2024 (arxiv): LLMs fall short on originality, excel at elaboration. First ideas are "average." Informs brainstorm agent's "push past the obvious" nudge.
- "Lost in the Middle" (Liu et al. 2023): LLMs attend best to beginning/end of context. Informs hook design — inject at context tail for high attention.
- Delta Debugging: binary search the change space between passing/failing
cases. Logic behind
git bisect. - Five Whys: iterative causal chain tracing. Starting point for hypothesis generation, not sole diagnostic method.
- Ronacher "Agent Design Is Still Hard": reinforce methodology after every tool call at context tail. Structural injection outperforms relying on instructions in the system prompt.
- Think-Anywhere (Jiang et al. arXiv:2603.29957, Mar 2026, Peking U + Tongyi
Lab): LLMs trained to invoke
<think>blocks at any token position during code generation, not just upfront. SOTA on LeetCode/LiveCodeBench with fewer total tokens. The motivating insight: a model can plan correctly at the start but introduce an off-by-one bug mid-implementation — only mid-loop reasoning catches it. Applied here: the research agent's investigation checklist includes "Re-evaluate hypothesis at every tool-call boundary." For Claude 4 models, interleaved thinking makes this automatic. Complements Plan-and-Solve: upfront decomposition where structure is clear, mid-execution re-evaluation when intermediate results change what to do next. - Anthropic interleaved thinking (Claude 4 + adaptive thinking): Claude Sonnet 4.6+ and Opus 4.6+ automatically insert thinking blocks between tool calls. No separate implementation needed — agent instruction design drives it. The research agent's "Re-evaluate at every tool-call boundary" instruction explicitly activates this behavior.
- Prompt/Context/Harness framework (Alibaba Cloud, Apr 2026): Names the three engineering layers. Prompt = task expression (stateless). Context = what the model sees (AGENTS.md, skills, tools — engineering target is progressive disclosure). Harness = system constraints + verification loops (hooks, permission gates, sub-agent isolation). Diagnostic map: wrong output → Prompt; hallucinated fact → Context; wrong tool selected → Context (fix description); task drift → Harness (sub-agent boundary); destructive action → Harness (permission hook). LangChain improved Terminal Bench 2.0 from 52.8% → 66.5% by changing Harness alone.
- Context engineering (Rajasekaran et al., Anthropic, Sep 2025): Formally distinguishes context engineering from prompt engineering. Key principles: (a) just-in-time context — agents hold references and load on demand, not upfront; (b) structured note-taking (NOTES.md) as external working memory for long sequential tasks; (c) every new token depletes attention budget — validates the <60-line AGENTS.md ceiling; (d) compaction strategy: maximize recall first, then improve precision.
MCP Server Lifecycle Hooks — Protocol Status (May 2026)
The .agents/mcp/ server exposes prompts and tools to agents via the MCP
protocol. A recurring question: can the MCP server react to session lifecycle
events (session start/end, tool-use boundaries)?
Current protocol state
No lifecycle hooks exist in the MCP protocol. The spec defines three phases
only: initialize → operation → shutdown. There is no session.created,
post-tool-call, or session.ended notification. This gap is why session
awareness currently lives in the OpenCode plugin layer
(.opencode/plugins/agent-support.ts) rather than the MCP server — OpenCode
exposes session.created, session.idle, session.compacted,
session.deleted, and tool.execute.before/after events natively to plugins.
Active work in the MCP spec
SEP-2624: Interceptors for the Model Context Protocol (PR #2624)
The most organized effort. Supersedes SEP-1763 (closed as completed). Proposes
Interceptors as a new MCP primitive — two types: validators (inspect,
return pass/fail) and mutators (transform context payloads) — discoverable
and invocable via interceptors/list and interceptor/invoke JSON-RPC methods.
These fire at protocol-level operation events: tools/call, prompts/get,
resources/read, sampling/createMessage, elicitation/create. Not
session-start/stop hooks, but before/after wrapping for every operation.
There is now a formal Interceptors Working Group (Bloomberg + Saxo Bank engineers, biweekly cadence). Reference implementations in progress for Go and C# SDKs. Experimental repo: modelcontextprotocol/experimental-ext-interceptors. Charter: modelcontextprotocol.io/community/interceptors/charter.
SEP-2282: Server-Declared Behavioural Hooks (PR #2282)
Smaller, separate open PR. Proposes servers declare context injections in
ServerCapabilities — text injected into the agent's context at client-side
lifecycle events (session start, post-tool-use, session end). The contract is
"here's context the model should have at this moment," not code execution. More
directly analogous to our OpenCode session.created / session.idle patterns.
Currently unsponsored — needs a maintainer to pick it up.
What to watch
- Primary: PR #2624 + experimental-ext-interceptors repo
- Secondary: PR #2282 (closest to session-lifecycle hooks)
- Label filter:
SEPlabel on the modelcontextprotocol repo - Milestone:
2026-06-30-RCis the next spec revision window
Implication for this project
Until interceptors land in a shipping spec version and the TypeScript SDK, the
session lifecycle pattern stays at the OpenCode plugin layer. When SEP-2282 or
an equivalent lands, the MCP server could self-register context injection hooks
during initialize, removing the need for tool-specific plugin code.
Model Scale Profiles
Different model sizes require different infrastructure strategies. The failure modes are different, so the mitigations are different.
Large-scale API models (Claude Sonnet / Opus)
Primary failure modes: overthinking, sycophancy, verbosity, tendency to add unrequested features or comments.
Infrastructure strategy:
- Advisory methodology + structural reinforcement (hooks, circuit breakers)
- PostToolUse self-check nudges every ~15 calls
- PreToolUse hard blocks for high-risk operations
- Subagent delegation for isolated tasks (parent Opus → child Sonnet/Haiku)
Smaller-scale local models (OmniCoder 9B via Ollama)
Primary failure modes (different from "low reasoning" — OmniCoder uses Qwen3 thinking blocks natively):
- Narrower training distribution (Python/JS heavy)
- Quantization degradation: JSON schema compliance drops as context fills
- Tool-call history is the primary context consumer — responses must be truncated aggressively
- Instruction drift: fewer attention heads (32 vs 64 in 32B) means system prompt recall degrades faster
Infrastructure strategy:
- PostToolUse response truncation at ~1500 tokens (plugin layer, not bash hook)
- PreToolUse JSON validation with schema-specific error messages
- Context pressure injection at ≥70% fill (~22K/32K tokens)
steps: 20cap +askpermission gates for natural checkpointsexploresubagent delegation to reduce context pressure on the main agentNOTES.mdworking memory pattern enforced in agent body- No
webtool — keeps context lean - Reasoning guidance: "Hold references; load on demand" explicit in agent body
OmniCoder 2 Orchestration — Pending Work
Full historical rationale and audit findings were maintained in
docs/projects/local-ai-orchestration.md(deleted May 2026 after merge). The plan used an orchestrator-workers pattern with structuraledit: denyenforcement on the orchestrator. All OpenCode config values verified against opencode.ai/docs (May 2026).
Goals
- All agents run on
ollama/arch-omni2-9b— no cloud fallback - User can type vague prompts; the system decomposes and delegates automatically
- Context windows are isolated per subagent (no shared state bleed)
- Changes scale forward: switching to cloud means changing model strings, not architecture
Pending Changes
Quick wins — under 5 minutes each, no testing required
-
- [CRITICAL] Fix
<tool\*call>typo inomnicoder2.modelfile— markdown-escape artifact; malformed opening tag paired with correct closing tag. Highest-leverage change; everything below depends on reliable tool-call JSON.
- [CRITICAL] Fix
-
- Mark canonical/deprecated modelfiles —
# CANONICALheader onomnicoder2.modelfile;# DEPRECATEDonomnicoder.modelfile;omnicoder-v2.modelfile.templatedeleted (was dead code — v2 now served from HuggingFace path).
- Mark canonical/deprecated modelfiles —
-
- Add
compaction.reserved: 3000toopencode.json— default 10,000 fires compaction too early given ~8–12K baseline context.
- Add
-
- Fix
pre-compact.shprettier call — removesnpx prettierwhich violates pre-tool-use Policy 1 (self-violating policy).
- Fix
-
- MCP server error handling — wrap
server.connect(transport)in try/catch with stderr +process.exit(1).
- MCP server error handling — wrap
Short session — 15–30 minutes each, bounded scope
-
- Fix
stop.shJSON escaping — replacesed-based escaping withprintf '%b' | node JSON.stringifypattern used in every other hook.
- Fix
-
- Per-session PostToolUse counter — repo-scoped path
/tmp/.opencode-tool-count-<repo-hash>(derived from REPO_ROOT via md5sum); prevents cross-repo contamination; session-start.sh resets it at session begin.
- Per-session PostToolUse counter — repo-scoped path
-
- Shrink compaction prompt to ~120 words (in
.opencode/plugins/agent-support.ts) — shorter instructions free bandwidth for the 9B to actually summarize.
- Shrink compaction prompt to ~120 words (in
-
- Update
.agents/agents/build-local.mdfor v2 — pagination 100 → 50 lines; rule 4 now says "recipient not dispatcher"; rule 7 scope-check says "tell the user, do not self-decompose".
- Update
Depends on orchestrator being proven first
-
- Trim root
AGENTS.mdto ~60 lines — reduced from 435 lines to 45 lines; all architecture rationale, code examples, quick task table, and project context removed; cross-cutting rules and quality gate preserved (May 2026).
- Trim root
-
-
PostToolUse weighted counter — reads (
read_file,grep,list) +0.25; writes/shell +1; keeps 15-call SELF-CHECK from firing mid-investigation sweep. Depends on #7 (per-session counter) first.**Implementation** (`.agents/hooks/post-tool-use.sh`): bash has no float arithmetic — scale to integers: reads +1, writes/shell +4, threshold 60 (equivalent to 15 effective write-units). Read-class tools: `read_file`, `grep_search`, `list_dir`, `file_search`, `semantic_search`, `explore_subagent`. Write/shell-class: all `*_string_in_file`, `create_file`, `run_in_terminal`. Replace the single `COUNT=$((COUNT + 1))` with a `case "$TOOL_NAME"` block that does `COUNT=$((COUNT + 1))` for reads and `COUNT=$((COUNT + 4))` for writes/shell. Change the self-check condition from `(( COUNT % 15 == 0 ))` to `(( COUNT % 60 == 0 ))`.
-
-
-
PostToolUse reminder priority filter — emit at most 2 reminders per tool call; priority: SELF-CHECK > DEBUGGING > path-scoped > tool-specific. Depends on #11.
**Implementation** (`.agents/hooks/post-tool-use.sh`): replace the current single `context` string accumulator with an indexed array `reminders=()`. Each block appends `reminders+=("$msg")` in priority order (SELF-CHECK first, DEBUGGING second, BFF/QUALITY GATE third, RENAME fourth). At output time: join only the first 2 elements. Append with `\n\n` separator. Blocks that didn't fire don't append, so the cap is natural.
-
-
-
Broaden PostToolUse truncation to all
ollama/agents (.opencode/plugins/agent-support.ts); differentiate limit: orchestrator 2,500 tokens vs workers 1,500. Minor until orchestrator exists.**Implementation**: rename `BUILD_LOCAL_MAX_RESPONSE_TOKENS` → `LOCAL_WORKER_MAX_TOKENS = 1500`; add `LOCAL_ORCHESTRATOR_MAX_TOKENS = 2500`. In `tool.execute.after`, the existing `isLocalAgent` check covers all `ollama/` agents via `input.model.startsWith('ollama/')`. Add a second check: `input.agent === 'local-orchestrator'` → use orchestrator limit, else worker limit. The `agent` field is available in `tool.execute.after` (confirmed working for `build-local`).
-
-
-
Create
.agents/agents/local-orchestrator.md— primary agent withedit: deny,write: deny,bash: deny; whitelisttasktobuild-local,research,brainstormonly.**Implementation**: new file modeled on `build-local.md`. Role: receive high-level goal, decompose into bounded subtasks, show decomposition to user before dispatching, delegate via `task` subagent. Permission block in `opencode.json` `agent.local-orchestrator`: `{ "edit": "deny", "write": "deny", "bash": "deny" }`. Agent body rules: (1) read project root `AGENTS.md` first; (2) produce a task list and confirm with user before dispatching; (3) one `task` call per subtask, wait for result; (4) never attempt to edit files directly — if a subtask requires context the worker needs, inject it via the `task` prompt, not by reading files yourself; (5) after all subtasks, report summary to user.
-
-
Set— Done May 2026. Key isdefault_agent: "local-orchestrator"inopencode.jsondefault_agent(snake_case, confirmed fromopencode.ai/config.jsonschema).local-orchestratorhasmode: allso it qualifies as a primary agent.
Done
Soften— file deleted (May 2026); DeepSeek R1 available online when needed; OmniCoder 2 is the sole local model.opus-deep.modelfiledirective
Known Tradeoffs
| Tradeoff | Impact | Mitigation |
|---|---|---|
Instructions glob trimmed to root AGENTS.md only |
Agents miss project-specific patterns for subdirectories unless they read nested AGENTS.md explicitly |
Add reminder in orchestrator + build-local agent body: "check nested AGENTS.md before working in subdirectories" |
| Same model for all roles | Orchestrator, worker, compaction agent are all same weights with different prompts | Structural edit: deny is the safety net; circuit breakers limit runaway loops |
| No cloud fallback | If task is too complex for 9B, no escalation path | Orchestrator includes "ask the user for direction" rule; user can switch to Copilot |
| Latency | Sequential dispatch: orchestrator decomposes → build-local runs → returns. ~2× wall time vs. direct build-local | Acceptable for local dev; no VRAM multiplier since Ollama keeps weights hot |
| Reminder-stacking cap | 2-per-call priority filter (pending work above) drops lower-priority warnings | Skipped reminders fire on next call if condition holds |
Cloud Migration Path
When ready to add a cloud model, only opencode.json changes:
{
"model": "ollama/arch-omni2-9b",
"agent": {
"local-orchestrator": {
"model": "anthropic/claude-haiku-4-5"
}
}
}
Schema verified against opencode.ai/docs/agents/ (May 2026). The tools key
inside agent configs is deprecated in favour of permission — the orchestrator
definition uses permission, so it is current. The agent.{name}.model key is
the correct per-agent override mechanism.
Ecosystem Gap — Contextual AGENTS.md Injection
During local AI work (May 2026) we hit a fundamental limitation: OpenCode's
instructions glob in opencode.json loads all matched files upfront into
every session. For a 9B local model with a 32K context window, loading all of
apps/*/AGENTS.md and packages/*/AGENTS.md at startup consumes ~30–40% of the
context budget before the first message, triggering early compaction and
degrading quality.
The correct behaviour — injecting only the AGENTS.md relevant to the file being
edited — does not exist natively in OpenCode or its plugin ecosystem. The
closest community plugin (opencode-skillful, 295 stars) is archived as of Feb
2026 and still requires the model to explicitly call skill_find/skill_use;
it provides no path-triggered structural injection.
Open tasks
-
- Assess: is filling this ecosystem gap worth the effort? — Before
building a contextual-injection plugin, evaluate: (a) Is OpenCode
actively used for serious local AI coding work, or is the community
primarily cloud-model users for whom context cost is irrelevant? (b)
Are there better local AI coding stacks (e.g. Aider + litellm, Cursor
local mode, VS Code Copilot + Ollama) where this problem is already
solved? (c) Is the
tool.execute.beforeevent stable enough to build on? Target: 30-minute research session, concrete go/no-go recommendation.
- Assess: is filling this ecosystem gap worth the effort? — Before
building a contextual-injection plugin, evaluate: (a) Is OpenCode
actively used for serious local AI coding work, or is the community
primarily cloud-model users for whom context cost is irrelevant? (b)
Are there better local AI coding stacks (e.g. Aider + litellm, Cursor
local mode, VS Code Copilot + Ollama) where this problem is already
solved? (c) Is the
-
- Review + write up our issues and fixes as an ecosystem
contribution — If the gap is worth filling: document the
context-bleed problem, the early-compaction root cause, our hook-based
mitigation, and the remaining structural gap. Publish as a GitHub
issue on the OpenCode repo and/or an npm plugin
(
opencode-contextual-rules?) implementingtool.execute.beforepath-triggered AGENTS.md injection. Depends on #16 go/no-go.
- Review + write up our issues and fixes as an ecosystem
contribution — If the gap is worth filling: document the
context-bleed problem, the early-compaction root cause, our hook-based
mitigation, and the remaining structural gap. Publish as a GitHub
issue on the OpenCode repo and/or an npm plugin
(
-
Trim— Done May 2026. Condensed from 12,584 → 10,507 bytes (43 lines removed). Trimmed: Hook Architecture Principle block (redirected to item 22 in project doc), Deferred Loading example + "why not" paragraph, session-start/stop hook prose, outdated.agents/AGENTS.mdgenerate-agents.tsreferences in Skills/Agents sections. Agent body files updated to prompt-body-only convention (see items 25/26).
-
Block bash bypass of read pagination— Done May 2026. Added Policy 14 topre-tool-use.sh: blockscat/head/tail/jqreads ofapps/*/package.jsonandpackages/*/package.json. Scope limited to package.json (confirmed live bypass vector); general.ts/.mdbash reads are not yet blocked (lower-urgency gap). Pattern verified with Node.js unit test — exact bypass commandcat apps/api/package.json | jqis caught by P1.
-
- Improve explore-first scope detection — Policy 14 blocks
manage_todo_listwith ≥4 items, but OmniCoder sometimes starts withExplore/findbefore planning, bypassing the check. Options: (a) blockexplore_subagentwhen the query looks like a multi-file discovery sweep (glob patterns for source files across multiple dirs); (b) add a pre-tool-use check onrun_in_terminalthat deniesfindcommands spanning the whole repo when the task hasn't been scoped yet; (c) rely on the todo-list check firing when planning eventually happens (current behavior — catches it late but still before edits start).
- Improve explore-first scope detection — Policy 14 blocks
-
Remove debug logging from plugin after verified cycle— Done May 2026. Removed the full-input dump block fromtool.execute.beforeinplugin.ts(/tmp/plugin-debug.jsonlappender). Guards verified viaopencode exportsession transcript inspection — no longer need the dump file. Hook error logger (/tmp/plugin-hook-errors.log) kept as it only fires on failures, not every call.
-
-
Refactor hook scripts to be platform-agnostic — currently
pre-tool-use.shparses Copilot-specific JSON and outputs Copilot-specificpermissionDecisionJSON.plugin.tsimplements duplicate guards inline rather than calling the script. This means OpenCode and Copilot guards can drift (confirmed May 2026: Policy 14 inpre-tool-use.shhad no effect on OpenCodebashtool calls).**Design target**: scripts accept normalized env vars (`TOOL_NAME`, `COMMAND`, `FILE_PATH`), exit non-zero with plain-text denial reason on stdout. Callers normalize input and translate output to their native denial format. Tracked in `.agents/AGENTS.md` Hook Architecture Principle section. **Audit required first**: review all hook scripts for Copilot-specific assumptions before refactoring.
-
-
-
Question-drift marker in
user-prompt-submit.sh— when the model has committed to a prior position and follow-up questions are being misread through that lens, prepend a disambiguation marker at the prompt tail. Detected pattern: model answers "no" or "not possible" in a prior turn → subsequent turns interpreted as defense of that position. See §2.1 ("Position-anchored priming") in the research doc.**Implementation**: in `user-prompt-submit.sh`, read the last N turns of `$TRANSCRIPT_PATH` (injected by OpenCode's native hook env) and look for a prior committed "no/impossible/can't" response within the last 3 model turns. If detected, append to `ADDITIONAL_CONTEXT`: `CURRENT QUESTION (answer only this — not the prior exchange): [prompt text]`. The key is repeating the user's exact question at the tail, after the marker, to counteract lost-in-the-middle effects. Fallback trigger: user prompt contains "that's not what I asked" / "you're answering the wrong question" / "I said" → always inject marker regardless of transcript scan.
-
-
Review all custom agent files for local-model-specific framing— Done May 2026.build-local.mdreframed: dropped "OmniCoder", "9B", "Ollama", "Qwen3 thinking blocks", "32K tokens total"; replaced with model-agnostic equivalents.research.mdandbrainstorm.mdverified clean — no model/provider mentions.local-orchestrator.mdwas fixed earlier this session. All four agent body files are now model-agnostic.
-
-
Failure-mode routing in SELF-CHECK — when the periodic SELF-CHECK fires in
post-tool-use.sh, if a recent terminal failure or test failure is also present in the same turn, classify the failure type and inject the matched intervention rather than generic "step back." Reference: failure-mode routing table in §3.5 of the research doc.**Implementation**: in the SELF-CHECK block, if `context` already contains `DEBUGGING REMINDER` (i.e., test/terminal failure co-occurred this turn), append a classification hint: `FAILURE TYPE HINT: If this is a test/build failure → Reflexion loop (fix based on test output). If convention violation → grep for the pattern and inject a canonical example. If wrong file/directory → stop and re-read the project structure. Do not default to "try harder."`. Low implementation cost — pure text append with a conditional on `$context`.
-
-
Audit agent— Done May 2026. Audit result: only.mdfiles for OpenCode-specific frontmatterlocal-orchestrator.mdhad OpenCode frontmatter keys (mode,model,permission).brainstorm.md,build-local.md,research.mdwere already plain markdown. Went with option (b): strippedmode/model/permissionfromlocal-orchestrator.md; movedmode: allintoopencode.json(model + permission were already there). Keptdescriptionin frontmatter as it is neutral and self-documenting. Body files are now prompt-body only — valid in both OpenCode and Copilot.
-
plugin.tslocal-agent detection uses provider prefix, not agent name —tool.execute.afterdetects local agents viainput.model.startsWith('ollama/'). This is provider-specific: if the model is served via a different backend (e.g.llama-server/,lmstudio/), truncation silently stops working. Fix: detect by agent name (input.agent.includes('build-local')) only, removing theollama/fallback. Theinput.agentfield is available intool.execute.after(confirmed May 2026).
-
plugin.tscontext pressure threshold is hardcoded to 32,768 tokens —CONTEXT_LIMIT_TOKENS = 32768assumes OmniCoder 9B's context window. If the local model changes, the threshold silently drifts out of calibration. Options: (a) read fromopencode.jsonmodel config if OpenCode exposes it to plugins; (b) make it a top-of-file constant with a comment to update when changing models; (c) accept the drift as low-severity (threshold is advisory only — context pressure warnings are informational, not blocking). Option (b) is the minimum; option (a) is ideal if OpenCode exposes model metadata to plugins.
-
Move— Done May 2026 as part of item 25.permissionout oflocal-orchestrator.mdfrontmattermode: alladded toopencode.jsonagent entry.modelandpermissionwere already inopencode.json.opencode.jsonis now the single source of truth for all runtime config;.mdfiles are prompt-body only.
Testing & Regression
Research summary (May 2026): No pre-existing tool exactly fits this use
case. Existing tools (RagaAI Catalyst, AgentEvalKit, agent-eval-arena,
intent-eval-lab, j-rig-skill-binary-eval) focus on LLM output quality,
hallucination detection, or cross-runtime behavior scoring — not config file
structure or policy enforcement regression. The closest analogue is
j-rig-skill-binary-eval (binary pass/fail criteria across 7 layers), which
uses the same conceptual approach we'd want here. Our testing is bespoke by
necessity: we're testing configuration files, shell scripts, and specific policy
enforcement behaviors, not general LLM response quality.
Two layers of testing:
| Layer | What it tests | Cost | When to run |
|---|---|---|---|
| Config + policy unit tests | Schema validity, hook regex correctness | None (no model) | Always — CI, pre-commit |
| CLI integration smoke tests | Actual enforcement via opencode run |
Local model only | On-demand; local model must be running |
Cloud agents excluded from integration tests — opencode run with a cloud
model (Copilot, Anthropic) incurs API costs and rate limits. Tests must detect
the active model and skip if it's not a local provider.
Open tasks
-
-
Config + policy unit test suite — test config file structure and hook regex patterns without invoking any model. Implementation:
a. **`opencode.json` schema validation**: the file references `"$schema": "https://opencode.ai/config.json"` — validate it using `ajv` (already used in the monorepo) against the live schema or a cached copy. Catches permission typos, unknown agent keys, unsupported field values. b. **Hook JSON structure validation**: validate `.agents/frameworks/github/hooks.json` and `.agents/frameworks/opencode/plugin.ts` (TypeScript, already type- checked). Write a schema for the hooks JSON format and run ajv on it. c. **Hook policy regex unit tests**: extract every regex used in `pre-tool-use.sh` into a `tests/hooks.test.ts` file and run it with `vitest`. For each policy, define 2–3 input strings that SHOULD match and 2–3 that SHOULD NOT. Policy 14 already has an informal Node.js test from this session — formalize it. d. **Agent `.md` frontmatter validator**: check that no agent file under `.agents/agents/` has frontmatter keys other than `description`. Catches regression when someone adds `model:` or `permission:` back to a body file. **Suggested location**: `.agents/tests/` or root `test/agents/`. **Stack**: vitest (already in monorepo), ajv (already available), Node built-ins. No new dependencies needed.
-
-
-
CLI integration smoke tests (local model only) — use
opencode runin non-interactive mode to verify enforcement is actually firing via the real runtime. These tests exercise the plugin + hook wiring end-to-end.**Command shape**: ``` opencode run "prompt" --agent build-local \ --model llama-server/arch-omni2-9b-native \ --format json ``` **Assertions via `opencode export`**: after each run, export the session with `opencode export <sessionID> 2>/dev/null` and parse the JSON transcript. Assert on `parts` array: tool calls that SHOULD have been blocked appear with error/denied status; tool calls that SHOULD have passed completed normally. **Test cases to start with** (all verified real enforcement gaps): 1. Attempt to `read` a nested `package.json` (e.g. `apps/api/package.json`) → BLOCKED by plugin package.json guard 2. Attempt to `read` a source file with no `limit` → BLOCKED by pagination guard 3. Attempt to `read` a source file with `limit: 51` → BLOCKED 4. Attempt to `read` a docs file with `limit: 501` → BLOCKED 5. Attempt to `read` a docs file with `limit: 50` → PASSES 6. Bash command `cat apps/api/package.json` → BLOCKED by pre-tool-use Policy 14 (substitute your project's equivalent nested package.json) **Guard rail**: skip all tests if `llama-server` is not reachable at `http://127.0.0.1:8080/v1`. Do not run against cloud models. Add an env var `AGENT_INTEGRATION_TESTS=1` required to enable (off by default, never runs in standard `npm test`). **Suggested location**: `.agents/tests/integration/`. **Stack**: Node.js test runner or vitest, `opencode` CLI in PATH.
-
Verified facts (May 2026)
- OpenCode's
readtool input schema is{ filePath: string, limit?: number, offset?: number }— NOTstartLine/endLine. Confirmed via plugin debug logging of real tool calls. tool.execute.beforeinput contains only{ tool, sessionID, callID }. It does NOT includeagentormodel, so plugin-layer gating cannot filter by agent. Confirmed via plugin debug logging.- OpenCode has its own native hook system that calls
pre-tool-use.shdirectly for tools likerun_in_terminal,replace_string_in_file, etc. This is completely separate from the plugin'srunHookcalls. The native hook payload includestimestamp,hook_event_name,session_id,transcript_path,tool_use_id, andcwd— fields the plugin never sends. The pluginrunHookis a second call, layered on top. - Bun shell
$API does not have a.stdin()method. The correct API for piping stdin is$`cmd < ${Buffer.from(text)}`..stdin(text)silently throwsTypeError: $\...`.stdin is not a function, which was caught byrunHook'scatchblock and returned''. This caused the plugin'srunHookto silently no-op for every call withstdinJsonsince the plugin was first written — hook enforcement (all 12 policies) was never running via the plugin path. It only ran via OpenCode's native hook system for the tools OpenCode natively supports. Confirmed via/tmp/plugin-hook-errors.log`. - The silent
catchinrunHookis dangerous. It masked the Bun.stdin()bug entirely. Always log hook failures to a debug file during development; remove only after enforcement is verified working. - Plugin-layer enforcement works for
readafter fixing the Bun stdin API. Thereadtool firestool.execute.beforein the plugin, which callsrunHook('pre-tool-use.sh', ...)via< ${Buffer.from(...)}, which applies Policy 13 (50-line limit). Verified: bareread(no limit) → BLOCKED;readwithlimit: 50→ passes. (May 2026) - Plugin load failure: unescaped regex slashes caused silent syntax error.
plugin-debug.jsonlwas empty even after the Bun stdin fix because the plugin file itself failed to parse. Line 84 had/(^|/)(apps|packages)/[^/]+/...— forward slashes inside the regex literal were not escaped, producing a JS syntax error at parse time. Bun silently drops plugins that fail to import. Fixed to/(^|\/)(apps|packages)\/[^/]+\/.... The fix also corrected the pagination guard to uselimit/offset(notstartLine/endLine) and added an unbounded-read block (limit === undefined). All three guards verified working in a live session (May 2026). - Package.json read guard verified working.
local-orchestratorattempting to readapps/*/package.jsonandpackages/*/package.json→ BLOCKED by plugin. Rootpackage.jsonread correctly passes. (May 2026) - Policy 14 (
manage_todo_list≥ 4 items) catches some but not all broad task attempts. OmniCoder sometimes proceeds directly toExplore/findwithout callingmanage_todo_listfirst, bypassing the policy. When it does plan with the todo tool before acting, the deny fires correctly. - OmniCoder comprehension failure: prompt ambiguity → wrong directory. Given
"refactor the five hook files", OmniCoder ran a glob for
*hook*files and found.husky/hooks instead of.agents/hooks/. The correct files were in the grep output from the Explore subagent but were not selected. Root cause: the model lacks enough context about the repo layout to disambiguate "hook files" without explicit path guidance. Mitigation: be explicit in prompts ("the five.agents/hooks/*.shfiles"). - OpenCode agent
permissionconfig requires a.opencode/agents/<name>.mdfile. Without a matching markdown file,opencode.json'sagent.<name>.permissionconfig is silently ignored — the agent is unknown to OpenCode and runs as a nameless build-agent alias. The markdown file must exist in.opencode/agents/(or~/.config/opencode/agents/). Confirmed by test run where@local-orchestratoredited files despitepermission.edit: "deny"in JSON config; fixed by creating.opencode/agents/local-orchestrator.mdsymlink. (May 2026) "write"is NOT a valid OpenCode permission key. Use"edit"instead — it coverswrite,edit, andapply_patchtools."write": "deny"is silently ignored. Valid top-level permission keys include:read,edit,glob,grep,list,bash,task,skill,lsp,question,webfetch,websearch,external_directory,doom_loop,todowrite. Confirmed fromopencode.ai/docs/permissions(May 2026).default_agentkey is snake_case inopencode.json(notdefaultAgent). Confirmed fromopencode.ai/docs/config(May 2026).tools: falseis deprecated. The current approach for per-agent tool restriction ispermission: { edit: "deny" }. The oldtools: falsestill works but is documented as legacy. Confirmed fromopencode.ai/docs/agents(May 2026).- Broken symlinks are silent. OpenCode does not error on a broken
.opencode/agents/symlink — it just skips the agent silently. The agent won't appear inopencode agent listand allopencode.jsonpermission config for it is ignored. Always verify withcat .opencode/agents/<name>.md | head -5(should print content, not a "No such file" error) andopencode agent list(agent should appear with correct deny rules). The correct symlink depth from.opencode/agents/is../../.agents/agents/<name>.md(two levels), not three. opencode agent listis the authoritative verification command. Run it after any agent config change to confirm: (a) the agent appears by name, (b) its mode is correct (all/primary/subagent), and (c)denyrules appear at the bottom of its permission list. Missing agent = broken symlink or YAML parse error. Present but missing deny rules = frontmatter not parsed correctly or wrong key names. (May 2026)@mentionrouting only works at session start. If you send any message that gets answered by the current primary agent first, then send@local-orchestrator ..., the TUI passes the full message text to the current model (Build/OmniCoder) which treats@local-orchestratoras freeform text and answers it itself. Always open a fresh session and make@agent-namethe very first message. Alternatively, useopencode run --agent local-orchestrator "..."from the CLI for reliable agent-scoped invocation. Tab-switching to a customall-mode agent in an existing session works correctly.edit: denyonlocal-orchestratoris working correctly. When given an edit task, the orchestrator correctly avoided usingreplace_string_in_fileand instead used thetasktool to delegate to a subagent. This is the expected behaviour. Confirmed May 2026.tasktool has a JSON serialization limit. OmniCoder 9B caused anUnterminated stringerror by embedding the entire contents of multiplepackage.jsonfiles as a literal string inside thetaskprompt JSON. Thetasktool prompt is serialized as JSON; very long strings truncate and produce parse errors. Mitigation: instruct the orchestrator in its system prompt to tell workers which files to read rather than quoting file contents inline. This has been added tolocal-orchestrator.md. (May 2026)ollama/arch-omni2-9bis the wrong model identifier for the llama-server instance. The correct ID isllama-server/arch-omni2-9b-native(verify withopencode models | grep arch). Using the wrong ID causes an immediate "cannot load model" error when the agent is invoked. Fixed inopencode.jsonandlocal-orchestrator.mdfrontmatter. (May 2026)
Open Issues
Known bugs and stale claims identified during code review (see deleted
agent-infrastructure-review.md and agent-infrastructure-review-pass2.md for
full context). Not yet fixed.
CRITICAL — description: empty in all generated agent/skill files
scripts/generate-agents.ts uses a hand-rolled YAML parser that silently drops
descriptions when they are written in block-scalar form (value on the next line
under the key). Every generated file in .github/agents/, .github/skills/,
.opencode/agents/, .opencode/skills/ has a blank description: field.
description: is the primary routing signal for Copilot's
SkillsContextComputer and OpenCode's agent dispatch. Explicitly @-mentioning
an agent by name still works; description-triggered auto-routing does not.
Fix: Inline the description strings in the canonical .agents/ source files
(change block-scalar to key: 'value' format). The existing parser handles
inline strings correctly. Add a generate:agents:check assertion that every
generated file has a non-empty description:.
MEDIUM — printf '%s' regression in hooks breaks \n rendering (resolved)
printf '%s' regression in hooks breaks \n rendering.agents/hooks/post-tool-use.sh, session-start.sh, and
user-prompt-submit.sh use printf '%s' "$context" | node -e '...' to
JSON-escape the context variable. %s does not interpret \n escape sequences,
so multi-line context strings (SELF-CHECK, DEBUGGING REMINDER, BFF REMINDER)
arrive at the model as single lines with literal \n characters.
Verified fixed (May 2026): all three hooks already use printf '%b'.
LOW — arXiv citation 2603.29957 unverified (resolved)
2603.29957 unverifiedarXiv:2603.29957 (Jiang et al. 2026, "Think-Anywhere") appears in
.agents/agents/research.md, .agents/agents/brainstorm.md, and the Research
Foundation section above. Verify the ID resolves at
https://arxiv.org/abs/2603.29957 and fix all references if it doesn't.
Verified real (May 2026): "Think Anywhere in Code Generation" by Xue Jiang, Tianyu Zhang, Ge Li et al., submitted March 31, 2026, revised April 27, 2026 (v3), cs.SE. All existing citations are correct.
LOW — .claude/ false claims in tool-agnostic-agent-infra.md (resolved)
.claude/ false claims in tool-agnostic-agent-infra.mdThe file docs/projects/tool-agnostic-agent-infra.md no longer exists — already
deleted. No action needed.