dotfiles/.agents/docs/failure-modes.md

# Failure Modes — Qwen3.6 & OpenCode

Compiled 2026-05-27. Sources linked inline.

---

## Qwen3.6 Model-Specific Quant & Routing Issues

### IQ3 Quant — Tool Call JSON Failure

| | |
|---|---|
| **Name** | IQ3 quant tool-call JSON breakage |
| **Description** | Qwen3.6 35B-A3B at IQ3_XXS quant fails function-call JSON generation entirely. BatiAI's Ollama benchmark shows ❌ for IQ3, ✅ for IQ4 and Q6. IQ3 is memory-bandwidth bound (~45.9 t/s on M4 Max) and loses the precision needed for structured JSON output in tool calls. |
| **Mitigation** | Use IQ4_XS or Q6_K for any workload with tool calling. IQ3 is acceptable only for text-only chat. IQ4 and Q6 show equivalent throughput. |
| **Sources** | [batiai/qwen3.6-35b:iq3 (Ollama)](https://ollama.com/batiai/qwen3.6-35b:iq3) |

### MoE Expert Loop — Q4_K_M & Below Routing Lock

| | |
|---|---|
| **Name** | Q4_K_M MoE expert routing collapse |
| **Description** | Qwen3.6's MoE architecture (256 routed experts, top-8 selection) degrades at Q4_K_M and below: the router locks into a subset of specialists (e.g., code-completion specialist for math queries, math specialist for syntax tasks). Expert activation entropy collapses. This is a structural MoE failure — dense Qwen2.5-72B does not exhibit this. Perplexity delta of +0.34 at Q4_K_M looks acceptable on paper but produces hallucinated method names, wrong parameter counts, and broken imports. |
| **Mitigation** | Default to Q6_K (1.6-point SWE-bench loss vs Q8_0, saves 2.1 GB VRAM). For 24 GB cards, Q4_K_M is acceptable only for RAG ingestion or documentation chat — not active code generation or function calling. Q8_0 wins SWE-bench Lite at 28.7%. BFCL v2 function-calling accuracy: 94.2% (Q8_0) → 89.7% (Q4_K_M). |
| **Sources** | [Qwen3.6 quant benchmarks: Q4 vs Q8 for MoE (CraftRigs)](https://craftrigs.com/comparisons/qwen3-6-quantization-benchmarks-q4-vs-q8/); [Qwen3.6-27B Setup Guide: 24GB GPU (CraftRigs)](https://craftrigs.com/guides/qwen3-6-27b-setup-guide-24gb-gpu/) |

### Official Chat Template — Non-Standard XML Parameter Format

| | |
|---|---|
| **Name** | Qwen3.6 official `chat_template.jinja` XML vs JSON incompatibility |
| **Description** | Qwen3.6's shipped `chat_template.jinja` instructs the model to generate function calls using a proprietary XML-like syntax (`<function=...><parameter=...>`) instead of OpenAI-compatible JSON. Missing closing tags cause parsing failures in standard inference frameworks (vLLM, HuggingFace transformers, llama-cpp-python, OpenAI-compatible API layers). Error: `Failed to parse input at pos XXXX: <function=read> <parameter=filePath> ...`. |
| **Mitigation** | Patch `chat_template.jinja` to use OpenAI-compatible JSON schema (`{"name": "function_name", "arguments": "{\"param1\": \"value1\"}"}`). |
| **Sources** | [abysslover/qwen36_tool_calling_failure (GitHub)](https://github.com/abysslover/qwen36_tool_calling_failure) |

### Long-Text Stability — Context Accumulation Amplifies Routing Drift

| | |
|---|---|
| **Name** | Q4_K_M multi-turn routing drift |
| **Description** | General chat tolerates +0.50 perplexity delta before quality drop is noticed. Multi-turn technical discussion (>3 turns with context accumulation), chain-of-thought reasoning, and structured output cross the threshold where expert loop errors become detectable within the first 10 responses. Context accumulation amplifies routing drift. |
| **Mitigation** | Q4_K_M acceptable for single-turn or short-context use. For long contexts or multi-turn structured output, use Q6_K or Q8_0. |
| **Sources** | [Qwen3.6 quant benchmarks: Q4 vs Q8 for MoE (CraftRigs)](https://craftrigs.com/comparisons/qwen3-6-quantization-benchmarks-q4-vs-q8/) |

---

## OpenCode Plugin / Hook-Specific Failures

### session.start — Resume / --continue Does Not Fire Plugin Context

| | |
|---|---|
| **Name** | session.start hook failure on resume |
| **Description** | `session.start` hook fires reliably for new sessions (`startup` trigger) but fails on resume (`--continue`/`--session`) with "No context found for instance" error. `Plugin.triggerSessionStart` is called during route navigation before the plugin context is fully initialized. Pending hook context is consumed lazily on the next model turn, so resume-triggered context can become stale if a session is resumed but not prompted soon after. |
| **Mitigation** | Be aware that `session.start` with `resume` trigger has a bootstrap timing edge case. Pending context becomes stale if the resumed session sits idle. PR #15224 documents the issue and a partial fix. |
| **Sources** | [OpenCode PR #15224 — feat(plugin): add session.start hook](https://github.com/anomalyco/opencode/pull/15224); [OpenCode Issue #5409 — SessionStart hook for session lifecycle events](https://github.com/sst/opencode/issues/5409) |

### PreToolUse — Ask Response Permanently Disables Bypass Permission

| | |
|---|---|
| **Name** | PreToolUse permission bypass lock |
| **Description** | When `PreToolUse` returns `permissionDecision: "ask"`, it permanently disables bypass permission mode until session restart. This is a state machine vulnerability — the permission bypass mode cannot recover from an `ask` response without a full session reset. |
| **Mitigation** | If using permission bypass mode, avoid `PreToolUse` hooks that return `ask`. Verify hook behavior after any policy change. |
| **Sources** | Claude Code #37420 (referenced in AGENTS.md) |

### session.created — Event Fails Reliably for Plugins

| | |
|---|---|
| **Name** | session.created event reliability for plugins |
| **Description** | `session.created` event fails to fire reliably for plugins due to MCP compatibility errors. This affects plugins that depend on session lifecycle events for initialization. |
| **Mitigation** | Use `session.start` hook as the primary initialization mechanism instead of relying on `session.created` events. |
| **Sources** | OpenCode #14808 (referenced in AGENTS.md, `~/.config/opencode/plugins/engram.ts`) |

### chat.message — Synthetic Text Injection Required for System Message Position

| | |
|---|---|
| **Name** | Jinja system message position enforcement |
| **Description** | vLLM propagates Qwen's strict Jinja template requiring `role=system` at index 0. Auxiliary context injection (e.g., from session-start hooks) breaks this if it places context after the system message. Fix: inject session-start as a synthetic `text` part via `output.parts.unshift()` on the first `chat.message` turn, not via `experimental.chat.system.transform`. Text parts have no position constraint. |
| **Mitigation** | Do not use `experimental.chat.system.transform` for session-start hooks with Qwen-family models. Use synthetic `text` parts via `output.parts.unshift()` on the first `chat.message` turn. |
| **Sources** | vLLM #41114; AGENTS.md (system reminder pattern) |

---

*Generated 2026-05-27 from web search findings.*