- AGENTS.md: design principles, enforcement hierarchy, deferred loading - agents/: brainstorm, build, orchestrator, research (auto-discovered by MCP server) - skills/: research methodology (auto-discovered by MCP server) - hooks/: pre-tool-use, post-tool-use (BFF block removed), session-start, stop, pre-compact, user-prompt-submit - frameworks/: opencode/plugin.ts (resolves hooks via import.meta.url — works as project-local or global plugin), github/hooks.json - mcp/index.ts: auto-discovers agents/*.md and skills/*.md from frontmatter (replaces hand-maintained registry); server renamed all-agents - docs/: agent-infrastructure.md (generalized), research docs (7 files), ai_architectures.md, llama-server-cuda-wsl2.md - install.sh: idempotent setup — Copilot global hooks, OpenCode global plugin + AGENTS.md + MCP entry, VS Code global MCP config
855 lines
49 KiB
Markdown
855 lines
49 KiB
Markdown
# Agent Infrastructure
|
||
|
||
Shared agent infrastructure for VS Code Copilot and OpenCode — brainstorm
|
||
agent, research agent, nudge instructions, hooks, skills, and MCP server.
|
||
Project-specific overlays live in each project's `.agents/` directory.
|
||
|
||
> **See also:**
|
||
> [`docs/research/ai-coding-best-practices.md`](../research/ai-coding-best-practices.md)
|
||
> — research synthesis covering the Prompt/Context/Harness taxonomy, failure
|
||
> modes, enforcement hierarchy, small-model harness patterns, and all
|
||
> primary-source citations that underpin the design decisions here.
|
||
|
||
## Current State
|
||
|
||
### Architecture Overview
|
||
|
||
The infrastructure is **tool-agnostic**: canonical sources live in `.agents/`
|
||
and a generator (`npm run generate:agents`) distributes them to
|
||
`.github/agents/`, `.github/skills/`, `.opencode/agents/`, `.opencode/skills/`.
|
||
Edit the `.agents/` sources; never edit the generated output directories (they
|
||
are `.gitignore`d and blocked by pre-tool-use policy).
|
||
|
||
```
|
||
.agents/
|
||
├── AGENTS.md # Root design doc + enforcement hierarchy
|
||
├── agents/ # Agent definitions (canonical)
|
||
│ ├── brainstorm.md
|
||
│ ├── research.md
|
||
│ └── build-local.md # OmniCoder 9B via Ollama
|
||
├── hooks/ # Shared bash hooks (delegated by all harnesses)
|
||
│ ├── pre-tool-use.sh # Hard blocks (terminal cmds + file-path policies)
|
||
│ ├── post-tool-use.sh # Self-check counter + methodology reminders
|
||
│ ├── session-start.sh # Inject project state at session start
|
||
│ ├── user-prompt-submit.sh # Per-turn nudge detection + task capture
|
||
│ ├── pre-compact.sh # Export state before context summarization
|
||
│ └── stop.sh # Session-end verification
|
||
└── skills/
|
||
└── research/SKILL.md # Research methodology (any agent can load)
|
||
```
|
||
|
||
Generated output (do not edit — regenerated by `npm run generate:agents`):
|
||
|
||
- `.github/agents/` — VS Code Copilot agent files
|
||
- `.github/skills/` — VS Code Copilot skill files
|
||
- `.opencode/agents/` — OpenCode agent files
|
||
- `.opencode/skills/` — OpenCode skill files
|
||
|
||
Harness integration:
|
||
|
||
- **VS Code Copilot**: `.github/agent-support.json` — maps 4 hook events to the
|
||
shared bash scripts in `.agents/hooks/`
|
||
- **OpenCode**: `.opencode/plugins/agent-support.ts` — TypeScript plugin that
|
||
shells out to the same bash scripts
|
||
|
||
### Brainstorm Agent
|
||
|
||
- 4-phase workflow: Quick Frame → Diverge → Converge → Capture & Hand Off
|
||
- 6 techniques: Rapid Ideation, SCAMPER, Worst Possible Idea, How Might We,
|
||
Inversion/Pre-mortem, Constraint Flipping
|
||
- Counterbalances Opus 4.6 overthinking tendency
|
||
- Phase 2 includes "push past the obvious" nudge (Zhao et al. 2024: LLMs fall
|
||
short on originality, excel at elaboration — first ideas are "average")
|
||
- Phase 4 routes to `@research` for investigation, default agent for
|
||
implementation
|
||
- Creates exploration files at `docs/explorations/<name>.md` and session memory
|
||
notes
|
||
|
||
### Research Agent
|
||
|
||
- Two orientations that compose recursively:
|
||
- **Understand** (Grounded Theory): open coding → constant comparison → axial
|
||
coding → memo → saturation check
|
||
- **Diagnose** (Strong Inference + Satisficing): 5-factor triage gates between
|
||
satisficing (low risk) and full falsification (high risk)
|
||
- 5-factor triage: reversibility, blast radius, confidence, novelty, time cost
|
||
- Timing awareness: `time` prefix on unknown commands, session/repo memory for
|
||
baselines, timing feeds into triage decisions
|
||
- Investigation files at `docs/explorations/<name>.md`
|
||
- Techniques reference: Five Whys, Delta Debugging, Rubber Duck
|
||
- Delegates evidence-gathering to Explore subagent, keeps analytical thinking
|
||
local
|
||
|
||
### Nudge Instructions
|
||
|
||
- Brainstorm nudge: triggers on hesitation/overthinking language ('wait',
|
||
'actually', 'hmm', 'overcomplicating', etc.)
|
||
- Research nudge: triggers on debugging/investigation language ('why is this
|
||
broken', 'how does this work', 'root cause', etc.)
|
||
- Both are non-intrusive single-sentence suggestions, only fire once per topic
|
||
|
||
### Tool Mapping (Copilot ↔ OpenCode)
|
||
|
||
| Copilot | OpenCode equivalent |
|
||
| ---------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
|
||
| `AGENTS.md` (root + nested) | `AGENTS.md` (root, native; nested via `instructions` glob in `opencode.json`) |
|
||
| `.github/agents/*.agent.md` | `.opencode/agents/*.md` (frontmatter: `description`, `mode`, `model`, `temperature`, `permission`) |
|
||
| `.github/skills/<name>/SKILL.md` | `.opencode/skills/<n>/SKILL.md` — also reads `.agents/skills/` and `.claude/skills/` |
|
||
| `.github/instructions/*.instructions.md` (`applyTo`) | No direct equivalent — fold into AGENTS.md stubs or `instructions` glob |
|
||
| `.github/hooks/*.sh` (JSON-configured shell) | `.opencode/plugins/*.ts` (TS modules, event-driven) — shells out via Bun's `$` |
|
||
| `runSubagent` / `Explore` agent | Built-in `general` and `explore` subagents; `@`-mention syntax |
|
||
| `vscode_askQuestions` | No equivalent — OpenCode uses agent's natural turn-taking |
|
||
|
||
OpenCode plugin event mapping:
|
||
|
||
| Copilot hook | OpenCode event |
|
||
| -------------- | ----------------------------------- |
|
||
| `SessionStart` | `session.created` |
|
||
| `PreToolUse` | `tool.execute.before` |
|
||
| `PostToolUse` | `tool.execute.after` |
|
||
| `PreCompact` | `experimental.session.compacting` |
|
||
| `Stop` | `session.idle` (closest equivalent) |
|
||
|
||
## Research Foundation
|
||
|
||
> For full research depth, citations, and failure-mode analysis, see
|
||
> [`docs/research/ai-coding-best-practices.md`](../research/ai-coding-best-practices.md).
|
||
> The list below records the specific papers and frameworks that shaped the
|
||
> design decisions in this project.
|
||
|
||
Methodologies and papers that informed the design:
|
||
|
||
- **Grounded Theory** (Glaser & Strauss): build understanding from data, not
|
||
assumptions. Applied to code-reading in the Understand orientation.
|
||
- **Strong Inference** (Platt 1964): multiple competing hypotheses → crucial
|
||
experiments → eliminate. Applied to the Diagnose orientation.
|
||
- **Satisficing** (Simon 1956): accept "good enough" when optimization cost
|
||
exceeds benefit. Gates between cheap confirmation and expensive falsification.
|
||
- **Dual Process Theory** (Kahneman): System 1 (fast, pattern-matching) vs
|
||
System 2 (slow, analytical). System 1 more accurate in familiar domains.
|
||
Informs the triage decision.
|
||
- **Zhao et al. 2024** (arxiv): LLMs fall short on originality, excel at
|
||
elaboration. First ideas are "average." Informs brainstorm agent's "push past
|
||
the obvious" nudge.
|
||
- **"Lost in the Middle"** (Liu et al. 2023): LLMs attend best to beginning/end
|
||
of context. Informs hook design — inject at context tail for high attention.
|
||
- **Delta Debugging**: binary search the change space between passing/failing
|
||
cases. Logic behind `git bisect`.
|
||
- **Five Whys**: iterative causal chain tracing. Starting point for hypothesis
|
||
generation, not sole diagnostic method.
|
||
- **Ronacher "Agent Design Is Still Hard"**: reinforce methodology after every
|
||
tool call at context tail. Structural injection outperforms relying on
|
||
instructions in the system prompt.
|
||
- **Think-Anywhere** (Jiang et al. arXiv:2603.29957, Mar 2026, Peking U + Tongyi
|
||
Lab): LLMs trained to invoke `<think>` blocks at any token position during
|
||
code generation, not just upfront. SOTA on LeetCode/LiveCodeBench with fewer
|
||
total tokens. The motivating insight: a model can plan correctly at the start
|
||
but introduce an off-by-one bug mid-implementation — only mid-loop reasoning
|
||
catches it. **Applied here**: the research agent's investigation checklist
|
||
includes "Re-evaluate hypothesis at every tool-call boundary." For Claude 4
|
||
models, interleaved thinking makes this automatic. Complements Plan-and-Solve:
|
||
upfront decomposition where structure is clear, mid-execution re-evaluation
|
||
when intermediate results change what to do next.
|
||
- **Anthropic interleaved thinking** (Claude 4 + adaptive thinking): Claude
|
||
Sonnet 4.6+ and Opus 4.6+ automatically insert thinking blocks between tool
|
||
calls. No separate implementation needed — agent instruction design drives it.
|
||
The research agent's "Re-evaluate at every tool-call boundary" instruction
|
||
explicitly activates this behavior.
|
||
- **Prompt/Context/Harness framework** (Alibaba Cloud, Apr 2026): Names the
|
||
three engineering layers. Prompt = task expression (stateless). Context = what
|
||
the model sees (AGENTS.md, skills, tools — engineering target is progressive
|
||
disclosure). Harness = system constraints + verification loops (hooks,
|
||
permission gates, sub-agent isolation). Diagnostic map: wrong output → Prompt;
|
||
hallucinated fact → Context; wrong tool selected → Context (fix description);
|
||
task drift → Harness (sub-agent boundary); destructive action → Harness
|
||
(permission hook). LangChain improved Terminal Bench 2.0 from 52.8% → 66.5% by
|
||
changing Harness alone.
|
||
- **Context engineering** (Rajasekaran et al., Anthropic, Sep 2025): Formally
|
||
distinguishes context engineering from prompt engineering. Key principles: (a)
|
||
just-in-time context — agents hold references and load on demand, not upfront;
|
||
(b) structured note-taking (NOTES.md) as external working memory for long
|
||
sequential tasks; (c) every new token depletes attention budget — validates
|
||
the <60-line AGENTS.md ceiling; (d) compaction strategy: maximize recall
|
||
first, then improve precision.
|
||
|
||
## MCP Server Lifecycle Hooks — Protocol Status (May 2026)
|
||
|
||
The `.agents/mcp/` server exposes prompts and tools to agents via the MCP
|
||
protocol. A recurring question: can the MCP server react to session lifecycle
|
||
events (session start/end, tool-use boundaries)?
|
||
|
||
### Current protocol state
|
||
|
||
**No lifecycle hooks exist in the MCP protocol.** The spec defines three phases
|
||
only: `initialize → operation → shutdown`. There is no `session.created`,
|
||
`post-tool-call`, or `session.ended` notification. This gap is why session
|
||
awareness currently lives in the OpenCode plugin layer
|
||
(`.opencode/plugins/agent-support.ts`) rather than the MCP server — OpenCode
|
||
exposes `session.created`, `session.idle`, `session.compacted`,
|
||
`session.deleted`, and `tool.execute.before/after` events natively to plugins.
|
||
|
||
### Active work in the MCP spec
|
||
|
||
**SEP-2624: Interceptors for the Model Context Protocol**
|
||
([PR #2624](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2624))
|
||
|
||
The most organized effort. Supersedes SEP-1763 (closed as completed). Proposes
|
||
**Interceptors** as a new MCP primitive — two types: **validators** (inspect,
|
||
return pass/fail) and **mutators** (transform context payloads) — discoverable
|
||
and invocable via `interceptors/list` and `interceptor/invoke` JSON-RPC methods.
|
||
These fire at protocol-level operation events: `tools/call`, `prompts/get`,
|
||
`resources/read`, `sampling/createMessage`, `elicitation/create`. Not
|
||
session-start/stop hooks, but before/after wrapping for every operation.
|
||
|
||
There is now a formal **Interceptors Working Group** (Bloomberg + Saxo Bank
|
||
engineers, biweekly cadence). Reference implementations in progress for Go and
|
||
C# SDKs. Experimental repo:
|
||
[modelcontextprotocol/experimental-ext-interceptors](https://github.com/modelcontextprotocol/experimental-ext-interceptors).
|
||
Charter:
|
||
[modelcontextprotocol.io/community/interceptors/charter](https://modelcontextprotocol.io/community/interceptors/charter).
|
||
|
||
**SEP-2282: Server-Declared Behavioural Hooks**
|
||
([PR #2282](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2282))
|
||
|
||
Smaller, separate open PR. Proposes servers declare **context injections** in
|
||
`ServerCapabilities` — text injected into the agent's context at client-side
|
||
lifecycle events (session start, post-tool-use, session end). The contract is
|
||
"here's context the model should have at this moment," not code execution. More
|
||
directly analogous to our OpenCode `session.created` / `session.idle` patterns.
|
||
Currently unsponsored — needs a maintainer to pick it up.
|
||
|
||
### What to watch
|
||
|
||
- **Primary**:
|
||
[PR #2624](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2624) +
|
||
experimental-ext-interceptors repo
|
||
- **Secondary**:
|
||
[PR #2282](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2282)
|
||
(closest to session-lifecycle hooks)
|
||
- **Label filter**:
|
||
[`SEP` label](https://github.com/modelcontextprotocol/modelcontextprotocol/issues?q=label%3ASEP)
|
||
on the modelcontextprotocol repo
|
||
- **Milestone**: `2026-06-30-RC` is the next spec revision window
|
||
|
||
### Implication for this project
|
||
|
||
Until interceptors land in a shipping spec version and the TypeScript SDK, the
|
||
session lifecycle pattern stays at the OpenCode plugin layer. When SEP-2282 or
|
||
an equivalent lands, the MCP server could self-register context injection hooks
|
||
during `initialize`, removing the need for tool-specific plugin code.
|
||
|
||
---
|
||
|
||
## Model Scale Profiles
|
||
|
||
Different model sizes require different infrastructure strategies. The failure
|
||
modes are different, so the mitigations are different.
|
||
|
||
### Large-scale API models (Claude Sonnet / Opus)
|
||
|
||
**Primary failure modes**: overthinking, sycophancy, verbosity, tendency to add
|
||
unrequested features or comments.
|
||
|
||
**Infrastructure strategy**:
|
||
|
||
- Advisory methodology + structural reinforcement (hooks, circuit breakers)
|
||
- PostToolUse self-check nudges every ~15 calls
|
||
- PreToolUse hard blocks for high-risk operations
|
||
- Subagent delegation for isolated tasks (parent Opus → child Sonnet/Haiku)
|
||
|
||
### Smaller-scale local models (OmniCoder 9B via Ollama)
|
||
|
||
**Primary failure modes** (different from "low reasoning" — OmniCoder uses Qwen3
|
||
thinking blocks natively):
|
||
|
||
- Narrower training distribution (Python/JS heavy)
|
||
- Quantization degradation: JSON schema compliance drops as context fills
|
||
- Tool-call history is the primary context consumer — responses must be
|
||
truncated aggressively
|
||
- Instruction drift: fewer attention heads (32 vs 64 in 32B) means system prompt
|
||
recall degrades faster
|
||
|
||
**Infrastructure strategy**:
|
||
|
||
- PostToolUse response truncation at ~1500 tokens (plugin layer, not bash hook)
|
||
- PreToolUse JSON validation with schema-specific error messages
|
||
- Context pressure injection at ≥70% fill (~22K/32K tokens)
|
||
- `steps: 20` cap + `ask` permission gates for natural checkpoints
|
||
- `explore` subagent delegation to reduce context pressure on the main agent
|
||
- `NOTES.md` working memory pattern enforced in agent body
|
||
- No `web` tool — keeps context lean
|
||
- Reasoning guidance: "Hold references; load on demand" explicit in agent body
|
||
|
||
---
|
||
|
||
## OmniCoder 2 Orchestration — Pending Work
|
||
|
||
> Full historical rationale and audit findings were maintained in
|
||
> `docs/projects/local-ai-orchestration.md` (deleted May 2026 after merge). The
|
||
> plan used an orchestrator-workers pattern with structural `edit: deny`
|
||
> enforcement on the orchestrator. All OpenCode config values verified against
|
||
> opencode.ai/docs (May 2026).
|
||
|
||
### Goals
|
||
|
||
1. All agents run on `ollama/arch-omni2-9b` — no cloud fallback
|
||
2. User can type vague prompts; the system decomposes and delegates
|
||
automatically
|
||
3. Context windows are isolated per subagent (no shared state bleed)
|
||
4. Changes scale forward: switching to cloud means changing model strings, not
|
||
architecture
|
||
|
||
### Pending Changes
|
||
|
||
#### Quick wins — under 5 minutes each, no testing required
|
||
|
||
1. - [x] **[CRITICAL] Fix `<tool\*call>` typo in `omnicoder2.modelfile`** —
|
||
markdown-escape artifact; malformed opening tag paired with correct
|
||
closing tag. Highest-leverage change; everything below depends on
|
||
reliable tool-call JSON.
|
||
2. - [x] **Mark canonical/deprecated modelfiles** — `# CANONICAL` header on
|
||
`omnicoder2.modelfile`; `# DEPRECATED` on `omnicoder.modelfile`;
|
||
`omnicoder-v2.modelfile.template` deleted (was dead code — v2 now
|
||
served from HuggingFace path).
|
||
3. - [x] **Add `compaction.reserved: 3000` to `opencode.json`** — default 10,000
|
||
fires compaction too early given ~8–12K baseline context.
|
||
4. - [x] **Fix `pre-compact.sh` prettier call** — removes `npx prettier` which
|
||
violates pre-tool-use Policy 1 (self-violating policy).
|
||
5. - [x] **MCP server error handling** — wrap `server.connect(transport)` in
|
||
try/catch with stderr + `process.exit(1)`.
|
||
|
||
#### Short session — 15–30 minutes each, bounded scope
|
||
|
||
6. - [x] **Fix `stop.sh` JSON escaping** — replace `sed`-based escaping with
|
||
`printf '%b' | node JSON.stringify` pattern used in every other hook.
|
||
7. - [x] **Per-session PostToolUse counter** — repo-scoped path
|
||
`/tmp/.opencode-tool-count-<repo-hash>` (derived from REPO_ROOT via
|
||
md5sum); prevents cross-repo contamination; session-start.sh resets it
|
||
at session begin.
|
||
8. - [x] **Shrink compaction prompt to ~120 words** (in
|
||
`.opencode/plugins/agent-support.ts`) — shorter instructions free
|
||
bandwidth for the 9B to actually summarize.
|
||
9. - [x] **Update `.agents/agents/build-local.md` for v2** — pagination 100 → 50
|
||
lines; rule 4 now says "recipient not dispatcher"; rule 7 scope-check
|
||
says "tell the user, do not self-decompose".
|
||
|
||
#### Depends on orchestrator being proven first
|
||
|
||
10. - [x] **Trim root `AGENTS.md` to ~60 lines** — reduced from 435 lines to 45
|
||
lines; all architecture rationale, code examples, quick task table,
|
||
and project context removed; cross-cutting rules and quality gate
|
||
preserved (May 2026).
|
||
11. - [x] **PostToolUse weighted counter** — reads (`read_file`, `grep`, `list`)
|
||
+0.25; writes/shell +1; keeps 15-call SELF-CHECK from firing
|
||
mid-investigation sweep. Depends on #7 (per-session counter) first.
|
||
|
||
**Implementation** (`.agents/hooks/post-tool-use.sh`): bash has no
|
||
float arithmetic — scale to integers: reads +1, writes/shell +4,
|
||
threshold 60 (equivalent to 15 effective write-units). Read-class
|
||
tools: `read_file`, `grep_search`, `list_dir`, `file_search`,
|
||
`semantic_search`, `explore_subagent`. Write/shell-class: all
|
||
`*_string_in_file`, `create_file`, `run_in_terminal`. Replace the
|
||
single `COUNT=$((COUNT + 1))` with a `case "$TOOL_NAME"` block that
|
||
does `COUNT=$((COUNT + 1))` for reads and `COUNT=$((COUNT + 4))` for
|
||
writes/shell. Change the self-check condition from
|
||
`(( COUNT % 15 == 0 ))` to `(( COUNT % 60 == 0 ))`.
|
||
|
||
12. - [x] **PostToolUse reminder priority filter** — emit at most 2 reminders
|
||
per tool call; priority: SELF-CHECK > DEBUGGING > path-scoped >
|
||
tool-specific. Depends on #11.
|
||
|
||
**Implementation** (`.agents/hooks/post-tool-use.sh`): replace the
|
||
current single `context` string accumulator with an indexed array
|
||
`reminders=()`. Each block appends `reminders+=("$msg")` in priority
|
||
order (SELF-CHECK first, DEBUGGING second, BFF/QUALITY GATE third,
|
||
RENAME fourth). At output time: join only the first 2 elements.
|
||
Append with `\n\n` separator. Blocks that didn't fire don't append,
|
||
so the cap is natural.
|
||
|
||
13. - [x] **Broaden PostToolUse truncation to all `ollama/` agents**
|
||
(`.opencode/plugins/agent-support.ts`); differentiate limit:
|
||
orchestrator 2,500 tokens vs workers 1,500. Minor until orchestrator
|
||
exists.
|
||
|
||
**Implementation**: rename `BUILD_LOCAL_MAX_RESPONSE_TOKENS` →
|
||
`LOCAL_WORKER_MAX_TOKENS = 1500`; add
|
||
`LOCAL_ORCHESTRATOR_MAX_TOKENS = 2500`. In `tool.execute.after`, the
|
||
existing `isLocalAgent` check covers all `ollama/` agents via
|
||
`input.model.startsWith('ollama/')`. Add a second check:
|
||
`input.agent === 'local-orchestrator'` → use orchestrator limit, else
|
||
worker limit. The `agent` field is available in `tool.execute.after`
|
||
(confirmed working for `build-local`).
|
||
|
||
14. - [x] **Create `.agents/agents/local-orchestrator.md`** — primary agent with
|
||
`edit: deny`, `write: deny`, `bash: deny`; whitelist `task` to
|
||
`build-local`, `research`, `brainstorm` only.
|
||
|
||
**Implementation**: new file modeled on `build-local.md`. Role: receive
|
||
high-level goal, decompose into bounded subtasks, show decomposition to
|
||
user before dispatching, delegate via `task` subagent. Permission
|
||
block in `opencode.json` `agent.local-orchestrator`:
|
||
`{ "edit": "deny", "write": "deny", "bash": "deny" }`. Agent body
|
||
rules: (1) read project root `AGENTS.md` first; (2) produce a task
|
||
list and confirm with user before dispatching; (3) one `task` call per
|
||
subtask, wait for result; (4) never attempt to edit files directly —
|
||
if a subtask requires context the worker needs, inject it via the
|
||
`task` prompt, not by reading files yourself; (5) after all subtasks,
|
||
report summary to user.
|
||
|
||
15. - [x] ~~**Set `default_agent: "local-orchestrator"` in `opencode.json`**~~ —
|
||
Done May 2026. Key is `default_agent` (snake_case, confirmed from
|
||
`opencode.ai/config.json` schema). `local-orchestrator` has
|
||
`mode: all` so it qualifies as a primary agent.
|
||
|
||
#### Done
|
||
|
||
- [x] ~~**Soften `opus-deep.modelfile` directive**~~ — file deleted (May 2026);
|
||
DeepSeek R1 available online when needed; OmniCoder 2 is the sole local
|
||
model.
|
||
|
||
### Known Tradeoffs
|
||
|
||
| Tradeoff | Impact | Mitigation |
|
||
| -------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
|
||
| Instructions glob trimmed to root `AGENTS.md` only | Agents miss project-specific patterns for subdirectories unless they read nested `AGENTS.md` explicitly | Add reminder in orchestrator + build-local agent body: "check nested `AGENTS.md` before working in subdirectories" |
|
||
| Same model for all roles | Orchestrator, worker, compaction agent are all same weights with different prompts | Structural `edit: deny` is the safety net; circuit breakers limit runaway loops |
|
||
| No cloud fallback | If task is too complex for 9B, no escalation path | Orchestrator includes "ask the user for direction" rule; user can switch to Copilot |
|
||
| Latency | Sequential dispatch: orchestrator decomposes → build-local runs → returns. ~2× wall time vs. direct build-local | Acceptable for local dev; no VRAM multiplier since Ollama keeps weights hot |
|
||
| Reminder-stacking cap | 2-per-call priority filter (pending work above) drops lower-priority warnings | Skipped reminders fire on next call if condition holds |
|
||
|
||
### Cloud Migration Path
|
||
|
||
When ready to add a cloud model, only `opencode.json` changes:
|
||
|
||
```json
|
||
{
|
||
"model": "ollama/arch-omni2-9b",
|
||
"agent": {
|
||
"local-orchestrator": {
|
||
"model": "anthropic/claude-haiku-4-5"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Schema verified against opencode.ai/docs/agents/ (May 2026). The `tools` key
|
||
inside agent configs is deprecated in favour of `permission` — the orchestrator
|
||
definition uses `permission`, so it is current. The `agent.{name}.model` key is
|
||
the correct per-agent override mechanism.
|
||
|
||
---
|
||
|
||
## Ecosystem Gap — Contextual AGENTS.md Injection
|
||
|
||
During local AI work (May 2026) we hit a fundamental limitation: OpenCode's
|
||
`instructions` glob in `opencode.json` loads **all matched files upfront** into
|
||
every session. For a 9B local model with a 32K context window, loading all of
|
||
`apps/*/AGENTS.md` and `packages/*/AGENTS.md` at startup consumes ~30–40% of the
|
||
context budget before the first message, triggering early compaction and
|
||
degrading quality.
|
||
|
||
The correct behaviour — injecting only the AGENTS.md relevant to the file being
|
||
edited — does not exist natively in OpenCode or its plugin ecosystem. The
|
||
closest community plugin (`opencode-skillful`, 295 stars) is archived as of Feb
|
||
2026 and still requires the model to explicitly call `skill_find`/`skill_use`;
|
||
it provides no path-triggered structural injection.
|
||
|
||
### Open tasks
|
||
|
||
16. - [ ] **Assess: is filling this ecosystem gap worth the effort?** — Before
|
||
building a contextual-injection plugin, evaluate: (a) Is OpenCode
|
||
actively used for serious local AI coding work, or is the community
|
||
primarily cloud-model users for whom context cost is irrelevant? (b)
|
||
Are there better local AI coding stacks (e.g. Aider + litellm, Cursor
|
||
local mode, VS Code Copilot + Ollama) where this problem is already
|
||
solved? (c) Is the `tool.execute.before` event stable enough to build
|
||
on? Target: 30-minute research session, concrete go/no-go
|
||
recommendation.
|
||
|
||
17. - [ ] **Review + write up our issues and fixes as an ecosystem
|
||
contribution** — If the gap is worth filling: document the
|
||
context-bleed problem, the early-compaction root cause, our hook-based
|
||
mitigation, and the remaining structural gap. Publish as a GitHub
|
||
issue on the OpenCode repo and/or an npm plugin
|
||
(`opencode-contextual-rules`?) implementing `tool.execute.before`
|
||
path-triggered AGENTS.md injection. Depends on #16 go/no-go.
|
||
|
||
18. - [x] ~~**Trim `.agents/AGENTS.md`**~~ — Done May 2026. Condensed from
|
||
12,584 → 10,507 bytes (43 lines removed). Trimmed: Hook Architecture
|
||
Principle block (redirected to item 22 in project doc), Deferred
|
||
Loading example + "why not" paragraph, session-start/stop hook prose,
|
||
outdated `generate-agents.ts` references in Skills/Agents sections.
|
||
Agent body files updated to prompt-body-only convention (see items
|
||
25/26).
|
||
|
||
19. - [x] ~~**Block bash bypass of read pagination**~~ — Done May 2026. Added
|
||
Policy 14 to `pre-tool-use.sh`: blocks `cat`/`head`/`tail`/`jq` reads
|
||
of `apps/*/package.json` and `packages/*/package.json`. Scope limited
|
||
to package.json (confirmed live bypass vector); general `.ts`/`.md`
|
||
bash reads are not yet blocked (lower-urgency gap). Pattern verified
|
||
with Node.js unit test — exact bypass command
|
||
`cat apps/api/package.json | jq` is caught by P1.
|
||
|
||
20. - [ ] **Improve explore-first scope detection** — Policy 14 blocks
|
||
`manage_todo_list` with ≥4 items, but OmniCoder sometimes starts with
|
||
`Explore`/`find` before planning, bypassing the check. Options: (a)
|
||
block `explore_subagent` when the query looks like a multi-file
|
||
discovery sweep (glob patterns for source files across multiple dirs);
|
||
(b) add a pre-tool-use check on `run_in_terminal` that denies `find`
|
||
commands spanning the whole repo when the task hasn't been scoped yet;
|
||
(c) rely on the todo-list check firing when planning eventually
|
||
happens (current behavior — catches it late but still before edits
|
||
start).
|
||
|
||
21. - [x] ~~**Remove debug logging from plugin after verified cycle**~~ — Done
|
||
May 2026. Removed the full-input dump block from `tool.execute.before`
|
||
in `plugin.ts` (`/tmp/plugin-debug.jsonl` appender). Guards verified
|
||
via `opencode export` session transcript inspection — no longer need
|
||
the dump file. Hook error logger (`/tmp/plugin-hook-errors.log`) kept
|
||
as it only fires on failures, not every call.
|
||
|
||
22. - [ ] **Refactor hook scripts to be platform-agnostic** — currently
|
||
`pre-tool-use.sh` parses Copilot-specific JSON and outputs
|
||
Copilot-specific `permissionDecision` JSON. `plugin.ts` implements
|
||
duplicate guards inline rather than calling the script. This means
|
||
OpenCode and Copilot guards can drift (confirmed May 2026: Policy 14
|
||
in `pre-tool-use.sh` had no effect on OpenCode `bash` tool calls).
|
||
|
||
**Design target**: scripts accept normalized env vars (`TOOL_NAME`,
|
||
`COMMAND`, `FILE_PATH`), exit non-zero with plain-text denial reason
|
||
on stdout. Callers normalize input and translate output to their
|
||
native denial format. Tracked in `.agents/AGENTS.md` Hook Architecture
|
||
Principle section.
|
||
|
||
**Audit required first**: review all hook scripts for Copilot-specific
|
||
assumptions before refactoring.
|
||
|
||
23. - [ ] **Question-drift marker in `user-prompt-submit.sh`** — when the model
|
||
has committed to a prior position and follow-up questions are being
|
||
misread through that lens, prepend a disambiguation marker at the
|
||
prompt tail. Detected pattern: model answers "no" or "not possible" in
|
||
a prior turn → subsequent turns interpreted as defense of that
|
||
position. See §2.1 ("Position-anchored priming") in the research doc.
|
||
|
||
**Implementation**: in `user-prompt-submit.sh`, read the last N turns
|
||
of `$TRANSCRIPT_PATH` (injected by OpenCode's native hook env) and
|
||
look for a prior committed "no/impossible/can't" response within the
|
||
last 3 model turns. If detected, append to `ADDITIONAL_CONTEXT`:
|
||
`CURRENT QUESTION (answer only this — not the prior exchange): [prompt
|
||
text]`. The key is repeating the user's exact question at the tail,
|
||
after the marker, to counteract lost-in-the-middle effects. Fallback
|
||
trigger: user prompt contains "that's not what I asked" / "you're
|
||
answering the wrong question" / "I said" → always inject marker
|
||
regardless of transcript scan.
|
||
|
||
24. - [x] ~~**Review all custom agent files for local-model-specific framing**~~
|
||
— Done May 2026. `build-local.md` reframed: dropped "OmniCoder", "9B",
|
||
"Ollama", "Qwen3 thinking blocks", "32K tokens total"; replaced with
|
||
model-agnostic equivalents. `research.md` and `brainstorm.md` verified
|
||
clean — no model/provider mentions. `local-orchestrator.md` was fixed
|
||
earlier this session. All four agent body files are now
|
||
model-agnostic.
|
||
|
||
25. - [ ] **Failure-mode routing in SELF-CHECK** — when the periodic SELF-CHECK
|
||
fires in `post-tool-use.sh`, if a recent terminal failure or test
|
||
failure is also present in the same turn, classify the failure type
|
||
and inject the matched intervention rather than generic "step back."
|
||
Reference: failure-mode routing table in §3.5 of the research doc.
|
||
|
||
**Implementation**: in the SELF-CHECK block, if `context` already
|
||
contains `DEBUGGING REMINDER` (i.e., test/terminal failure co-occurred
|
||
this turn), append a classification hint:
|
||
`FAILURE TYPE HINT: If this is a test/build failure → Reflexion loop
|
||
(fix based on test output). If convention violation → grep for the
|
||
pattern and inject a canonical example. If wrong file/directory → stop
|
||
and re-read the project structure. Do not default to "try harder."`.
|
||
Low implementation cost — pure text append with a conditional on
|
||
`$context`.
|
||
|
||
26. - [x] ~~**Audit agent `.md` files for OpenCode-specific frontmatter**~~ —
|
||
Done May 2026. Audit result: only `local-orchestrator.md` had OpenCode
|
||
frontmatter keys (`mode`, `model`, `permission`). `brainstorm.md`,
|
||
`build-local.md`, `research.md` were already plain markdown. Went with
|
||
option (b): stripped `mode`/`model`/`permission` from
|
||
`local-orchestrator.md`; moved `mode: all` into `opencode.json`
|
||
(model + permission were already there). Kept `description` in
|
||
frontmatter as it is neutral and self-documenting. Body files are now
|
||
prompt-body only — valid in both OpenCode and Copilot.
|
||
|
||
27. - [ ] **`plugin.ts` local-agent detection uses provider prefix, not agent
|
||
name** — `tool.execute.after` detects local agents via
|
||
`input.model.startsWith('ollama/')`. This is provider-specific: if the
|
||
model is served via a different backend (e.g. `llama-server/`,
|
||
`lmstudio/`), truncation silently stops working. Fix: detect by agent
|
||
name (`input.agent.includes('build-local')`) only, removing the
|
||
`ollama/` fallback. The `input.agent` field is available in
|
||
`tool.execute.after` (confirmed May 2026).
|
||
|
||
28. - [ ] **`plugin.ts` context pressure threshold is hardcoded to 32,768
|
||
tokens** — `CONTEXT_LIMIT_TOKENS = 32768` assumes OmniCoder 9B's
|
||
context window. If the local model changes, the threshold silently
|
||
drifts out of calibration. Options: (a) read from `opencode.json`
|
||
model config if OpenCode exposes it to plugins; (b) make it a
|
||
top-of-file constant with a comment to update when changing models;
|
||
(c) accept the drift as low-severity (threshold is advisory only —
|
||
context pressure warnings are informational, not blocking). Option (b)
|
||
is the minimum; option (a) is ideal if OpenCode exposes model metadata
|
||
to plugins.
|
||
|
||
29. - [x] ~~**Move `permission` out of `local-orchestrator.md` frontmatter**~~ —
|
||
Done May 2026 as part of item 25. `mode: all` added to `opencode.json`
|
||
agent entry. `model` and `permission` were already in `opencode.json`.
|
||
`opencode.json` is now the single source of truth for all runtime
|
||
config; `.md` files are prompt-body only.
|
||
|
||
---
|
||
|
||
## Testing & Regression
|
||
|
||
**Research summary (May 2026):** No pre-existing tool exactly fits this use
|
||
case. Existing tools (RagaAI Catalyst, AgentEvalKit, agent-eval-arena,
|
||
intent-eval-lab, j-rig-skill-binary-eval) focus on LLM output quality,
|
||
hallucination detection, or cross-runtime behavior scoring — not config file
|
||
structure or policy enforcement regression. The closest analogue is
|
||
`j-rig-skill-binary-eval` (binary pass/fail criteria across 7 layers), which
|
||
uses the same conceptual approach we'd want here. Our testing is bespoke by
|
||
necessity: we're testing configuration files, shell scripts, and specific policy
|
||
enforcement behaviors, not general LLM response quality.
|
||
|
||
**Two layers of testing:**
|
||
|
||
| Layer | What it tests | Cost | When to run |
|
||
| --------------------------- | --------------------------------------- | ---------------- | -------------------------------------- |
|
||
| Config + policy unit tests | Schema validity, hook regex correctness | None (no model) | Always — CI, pre-commit |
|
||
| CLI integration smoke tests | Actual enforcement via `opencode run` | Local model only | On-demand; local model must be running |
|
||
|
||
**Cloud agents excluded from integration tests** — `opencode run` with a cloud
|
||
model (Copilot, Anthropic) incurs API costs and rate limits. Tests must detect
|
||
the active model and skip if it's not a local provider.
|
||
|
||
### Open tasks
|
||
|
||
30. - [ ] **Config + policy unit test suite** — test config file structure and
|
||
hook regex patterns without invoking any model. Implementation:
|
||
|
||
a. **`opencode.json` schema validation**: the file references
|
||
`"$schema": "https://opencode.ai/config.json"` — validate it using
|
||
`ajv` (already used in the monorepo) against the live schema or a
|
||
cached copy. Catches permission typos, unknown agent keys,
|
||
unsupported field values.
|
||
|
||
b. **Hook JSON structure validation**: validate
|
||
`.agents/frameworks/github/hooks.json` and
|
||
`.agents/frameworks/opencode/plugin.ts` (TypeScript, already type-
|
||
checked). Write a schema for the hooks JSON format and run ajv on
|
||
it.
|
||
|
||
c. **Hook policy regex unit tests**: extract every regex used in
|
||
`pre-tool-use.sh` into a `tests/hooks.test.ts` file and run it
|
||
with `vitest`. For each policy, define 2–3 input strings that
|
||
SHOULD match and 2–3 that SHOULD NOT. Policy 14 already has an
|
||
informal Node.js test from this session — formalize it.
|
||
|
||
d. **Agent `.md` frontmatter validator**: check that no agent file
|
||
under `.agents/agents/` has frontmatter keys other than
|
||
`description`. Catches regression when someone adds `model:` or
|
||
`permission:` back to a body file.
|
||
|
||
**Suggested location**: `.agents/tests/` or root `test/agents/`.
|
||
**Stack**: vitest (already in monorepo), ajv (already available), Node
|
||
built-ins. No new dependencies needed.
|
||
|
||
31. - [ ] **CLI integration smoke tests (local model only)** — use
|
||
`opencode run` in non-interactive mode to verify enforcement is
|
||
actually firing via the real runtime. These tests exercise the
|
||
plugin + hook wiring end-to-end.
|
||
|
||
**Command shape**:
|
||
```
|
||
opencode run "prompt" --agent build-local \
|
||
--model llama-server/arch-omni2-9b-native \
|
||
--format json
|
||
```
|
||
|
||
**Assertions via `opencode export`**: after each run, export the
|
||
session with `opencode export <sessionID> 2>/dev/null` and parse the
|
||
JSON transcript. Assert on `parts` array: tool calls that SHOULD have
|
||
been blocked appear with error/denied status; tool calls that SHOULD
|
||
have passed completed normally.
|
||
|
||
**Test cases to start with** (all verified real enforcement gaps):
|
||
1. Attempt to `read` a nested `package.json` (e.g. `apps/api/package.json`) → BLOCKED by plugin
|
||
package.json guard
|
||
2. Attempt to `read` a source file with no `limit` → BLOCKED by
|
||
pagination guard
|
||
3. Attempt to `read` a source file with `limit: 51` → BLOCKED
|
||
4. Attempt to `read` a docs file with `limit: 501` → BLOCKED
|
||
5. Attempt to `read` a docs file with `limit: 50` → PASSES
|
||
6. Bash command `cat apps/api/package.json` → BLOCKED by pre-tool-use
|
||
Policy 14 (substitute your project's equivalent nested package.json)
|
||
|
||
**Guard rail**: skip all tests if `llama-server` is not reachable at
|
||
`http://127.0.0.1:8080/v1`. Do not run against cloud models. Add
|
||
an env var `AGENT_INTEGRATION_TESTS=1` required to enable (off by
|
||
default, never runs in standard `npm test`).
|
||
|
||
**Suggested location**: `.agents/tests/integration/`.
|
||
**Stack**: Node.js test runner or vitest, `opencode` CLI in PATH.
|
||
|
||
### Verified facts (May 2026)
|
||
|
||
- OpenCode's `read` tool input schema is
|
||
`{ filePath: string, limit?: number, offset?: number }` — NOT
|
||
`startLine`/`endLine`. Confirmed via plugin debug logging of real tool calls.
|
||
- `tool.execute.before` input contains only `{ tool, sessionID, callID }`. It
|
||
does NOT include `agent` or `model`, so plugin-layer gating cannot filter by
|
||
agent. Confirmed via plugin debug logging.
|
||
- **OpenCode has its own native hook system** that calls `pre-tool-use.sh`
|
||
directly for tools like `run_in_terminal`, `replace_string_in_file`, etc. This
|
||
is completely separate from the plugin's `runHook` calls. The native hook
|
||
payload includes `timestamp`, `hook_event_name`, `session_id`,
|
||
`transcript_path`, `tool_use_id`, and `cwd` — fields the plugin never sends.
|
||
The plugin `runHook` is a _second_ call, layered on top.
|
||
- **Bun shell `$` API does not have a `.stdin()` method.** The correct API for
|
||
piping stdin is `` $`cmd < ${Buffer.from(text)}` ``. `.stdin(text)` silently
|
||
throws `TypeError: $\`...\`.stdin is not a
|
||
function`, which was caught by `runHook`'s `catch`block and returned`''`. This caused the plugin's `runHook`to silently no-op for every call with`stdinJson`since the plugin was first written — hook enforcement (all 12 policies) was never running via the plugin path. It only ran via OpenCode's native hook system for the tools OpenCode natively supports. Confirmed via`/tmp/plugin-hook-errors.log`.
|
||
- **The silent `catch` in `runHook` is dangerous.** It masked the Bun `.stdin()`
|
||
bug entirely. Always log hook failures to a debug file during development;
|
||
remove only after enforcement is verified working.
|
||
- **Plugin-layer enforcement works for `read`** after fixing the Bun stdin API.
|
||
The `read` tool fires `tool.execute.before` in the plugin, which calls
|
||
`runHook('pre-tool-use.sh', ...)` via `< ${Buffer.from(...)}`, which applies
|
||
Policy 13 (50-line limit). Verified: bare `read` (no limit) → BLOCKED; `read`
|
||
with `limit: 50` → passes. (May 2026)
|
||
- **Plugin load failure: unescaped regex slashes caused silent syntax error.**
|
||
`plugin-debug.jsonl` was empty even after the Bun stdin fix because the plugin
|
||
file itself failed to parse. Line 84 had `/(^|/)(apps|packages)/[^/]+/...` —
|
||
forward slashes inside the regex literal were not escaped, producing a JS
|
||
syntax error at parse time. Bun silently drops plugins that fail to import.
|
||
Fixed to `/(^|\/)(apps|packages)\/[^/]+\/...`. The fix also corrected the
|
||
pagination guard to use `limit`/`offset` (not `startLine`/`endLine`) and added
|
||
an unbounded-read block (`limit === undefined`). All three guards verified
|
||
working in a live session (May 2026).
|
||
- **Package.json read guard verified working.** `local-orchestrator` attempting
|
||
to read `apps/*/package.json` and `packages/*/package.json` → BLOCKED by
|
||
plugin. Root `package.json` read correctly passes. (May 2026)
|
||
- **Policy 14 (`manage_todo_list` ≥ 4 items) catches some but not all broad task
|
||
attempts.** OmniCoder sometimes proceeds directly to `Explore`/`find` without
|
||
calling `manage_todo_list` first, bypassing the policy. When it does plan with
|
||
the todo tool before acting, the deny fires correctly.
|
||
- **OmniCoder comprehension failure: prompt ambiguity → wrong directory.** Given
|
||
"refactor the five hook files", OmniCoder ran a glob for `*hook*` files and
|
||
found `.husky/` hooks instead of `.agents/hooks/`. The correct files were in
|
||
the grep output from the Explore subagent but were not selected. Root cause:
|
||
the model lacks enough context about the repo layout to disambiguate "hook
|
||
files" without explicit path guidance. Mitigation: be explicit in prompts
|
||
("the five `.agents/hooks/*.sh` files").
|
||
- **OpenCode agent `permission` config requires a `.opencode/agents/<name>.md`
|
||
file.** Without a matching markdown file, `opencode.json`'s
|
||
`agent.<name>.permission` config is silently ignored — the agent is unknown to
|
||
OpenCode and runs as a nameless build-agent alias. The markdown file must
|
||
exist in `.opencode/agents/` (or `~/.config/opencode/agents/`). Confirmed by
|
||
test run where `@local-orchestrator` edited files despite
|
||
`permission.edit: "deny"` in JSON config; fixed by creating
|
||
`.opencode/agents/local-orchestrator.md` symlink. (May 2026)
|
||
- **`"write"` is NOT a valid OpenCode permission key.** Use `"edit"` instead —
|
||
it covers `write`, `edit`, and `apply_patch` tools. `"write": "deny"` is
|
||
silently ignored. Valid top-level permission keys include: `read`, `edit`,
|
||
`glob`, `grep`, `list`, `bash`, `task`, `skill`, `lsp`, `question`,
|
||
`webfetch`, `websearch`, `external_directory`, `doom_loop`, `todowrite`.
|
||
Confirmed from `opencode.ai/docs/permissions` (May 2026).
|
||
- **`default_agent` key is snake_case** in `opencode.json` (not `defaultAgent`).
|
||
Confirmed from `opencode.ai/docs/config` (May 2026).
|
||
- **`tools: false` is deprecated.** The current approach for per-agent tool
|
||
restriction is `permission: { edit: "deny" }`. The old `tools: false` still
|
||
works but is documented as legacy. Confirmed from `opencode.ai/docs/agents`
|
||
(May 2026).
|
||
- **Broken symlinks are silent.** OpenCode does not error on a broken
|
||
`.opencode/agents/` symlink — it just skips the agent silently. The agent
|
||
won't appear in `opencode agent list` and all `opencode.json` permission
|
||
config for it is ignored. Always verify with
|
||
`cat .opencode/agents/<name>.md | head -5` (should print content, not a "No
|
||
such file" error) and `opencode agent list` (agent should appear with correct
|
||
deny rules). The correct symlink depth from `.opencode/agents/` is
|
||
`../../.agents/agents/<name>.md` (two levels), not three.
|
||
- **`opencode agent list` is the authoritative verification command.** Run it
|
||
after any agent config change to confirm: (a) the agent appears by name, (b)
|
||
its mode is correct (`all`/`primary`/`subagent`), and (c) `deny` rules appear
|
||
at the bottom of its permission list. Missing agent = broken symlink or YAML
|
||
parse error. Present but missing deny rules = frontmatter not parsed correctly
|
||
or wrong key names. (May 2026)
|
||
- **`@mention` routing only works at session start.** If you send any message
|
||
that gets answered by the current primary agent first, then send
|
||
`@local-orchestrator ...`, the TUI passes the full message text to the current
|
||
model (Build/OmniCoder) which treats `@local-orchestrator` as freeform text
|
||
and answers it itself. Always open a **fresh session** and make `@agent-name`
|
||
the very first message. Alternatively, use
|
||
`opencode run --agent local-orchestrator "..."` from the CLI for reliable
|
||
agent-scoped invocation. **Tab-switching to a custom `all`-mode agent in an
|
||
existing session works correctly.**
|
||
- **`edit: deny` on `local-orchestrator` is working correctly.** When given an
|
||
edit task, the orchestrator correctly avoided using `replace_string_in_file`
|
||
and instead used the `task` tool to delegate to a subagent. This is the
|
||
expected behaviour. Confirmed May 2026.
|
||
- **`task` tool has a JSON serialization limit.** OmniCoder 9B caused an
|
||
`Unterminated string` error by embedding the entire contents of multiple
|
||
`package.json` files as a literal string inside the `task` prompt JSON. The
|
||
`task` tool prompt is serialized as JSON; very long strings truncate and
|
||
produce parse errors. Mitigation: instruct the orchestrator in its system
|
||
prompt to tell workers _which files to read_ rather than quoting file contents
|
||
inline. This has been added to `local-orchestrator.md`. (May 2026)
|
||
- **`ollama/arch-omni2-9b` is the wrong model identifier for the llama-server
|
||
instance.** The correct ID is `llama-server/arch-omni2-9b-native` (verify with
|
||
`opencode models | grep arch`). Using the wrong ID causes an immediate "cannot
|
||
load model" error when the agent is invoked. Fixed in `opencode.json` and
|
||
`local-orchestrator.md` frontmatter. (May 2026)
|
||
|
||
## Open Issues
|
||
|
||
Known bugs and stale claims identified during code review (see deleted
|
||
`agent-infrastructure-review.md` and `agent-infrastructure-review-pass2.md` for
|
||
full context). Not yet fixed.
|
||
|
||
### CRITICAL — `description:` empty in all generated agent/skill files
|
||
|
||
`scripts/generate-agents.ts` uses a hand-rolled YAML parser that silently drops
|
||
descriptions when they are written in block-scalar form (value on the next line
|
||
under the key). Every generated file in `.github/agents/`, `.github/skills/`,
|
||
`.opencode/agents/`, `.opencode/skills/` has a blank `description:` field.
|
||
|
||
`description:` is the primary routing signal for Copilot's
|
||
`SkillsContextComputer` and OpenCode's agent dispatch. Explicitly `@`-mentioning
|
||
an agent by name still works; description-triggered auto-routing does not.
|
||
|
||
**Fix**: Inline the description strings in the canonical `.agents/` source files
|
||
(change block-scalar to `key: 'value'` format). The existing parser handles
|
||
inline strings correctly. Add a `generate:agents:check` assertion that every
|
||
generated file has a non-empty `description:`.
|
||
|
||
### MEDIUM — ~~`printf '%s'` regression in hooks breaks `\n` rendering~~ (resolved)
|
||
|
||
~~`.agents/hooks/post-tool-use.sh`, `session-start.sh`, and
|
||
`user-prompt-submit.sh` use `printf '%s' "$context" | node -e '...'` to
|
||
JSON-escape the context variable. `%s` does not interpret `\n` escape sequences,
|
||
so multi-line context strings (SELF-CHECK, DEBUGGING REMINDER, BFF REMINDER)
|
||
arrive at the model as single lines with literal `\n` characters.~~
|
||
|
||
**Verified fixed** (May 2026): all three hooks already use `printf '%b'`.
|
||
|
||
### LOW — ~~arXiv citation `2603.29957` unverified~~ (resolved)
|
||
|
||
~~`arXiv:2603.29957` (Jiang et al. 2026, "Think-Anywhere") appears in
|
||
`.agents/agents/research.md`, `.agents/agents/brainstorm.md`, and the Research
|
||
Foundation section above. Verify the ID resolves at
|
||
`https://arxiv.org/abs/2603.29957` and fix all references if it doesn't.~~
|
||
|
||
**Verified real** (May 2026): "Think Anywhere in Code Generation" by Xue Jiang,
|
||
Tianyu Zhang, Ge Li et al., submitted March 31, 2026, revised April 27, 2026
|
||
(v3), cs.SE. All existing citations are correct.
|
||
|
||
### LOW — ~~`.claude/` false claims in `tool-agnostic-agent-infra.md`~~ (resolved)
|
||
|
||
The file `docs/projects/tool-agnostic-agent-infra.md` no longer exists — already
|
||
deleted. No action needed.
|