dotfiles/.agents/skills/research.md
Brydon DeWitt 6b07e4ccb2 feat: add shared agent infrastructure (.agents/)
- AGENTS.md: design principles, enforcement hierarchy, deferred loading
- agents/: brainstorm, build, orchestrator, research (auto-discovered by MCP server)
- skills/: research methodology (auto-discovered by MCP server)
- hooks/: pre-tool-use, post-tool-use (BFF block removed), session-start,
  stop, pre-compact, user-prompt-submit
- frameworks/: opencode/plugin.ts (resolves hooks via import.meta.url — works
  as project-local or global plugin), github/hooks.json
- mcp/index.ts: auto-discovers agents/*.md and skills/*.md from frontmatter
  (replaces hand-maintained registry); server renamed all-agents
- docs/: agent-infrastructure.md (generalized), research docs (7 files),
  ai_architectures.md, llama-server-cuda-wsl2.md
- install.sh: idempotent setup — Copilot global hooks, OpenCode global plugin +
  AGENTS.md + MCP entry, VS Code global MCP config
2026-05-22 13:13:43 -04:00

4.7 KiB

description toolName
Load the structured research methodology — call this when starting any investigation, debugging session, root cause analysis, or systematic exploration of unfamiliar code. Returns a checklist with two orientations (Understand + Diagnose), risk-based triage, circuit breakers, and context management guidance. load_research_methodology

Research Methodology Skill

This skill provides a structured, evidence-based investigation methodology. It prevents common AI agent failure modes: pattern-matching without evidence, confirmation bias, fixing symptoms instead of causes, and methodology drift during long sessions.

Quick Reference: The Investigation Checklist

Before every hypothesis cycle:

  • Hypothesis written (one sentence: "I believe X because Y")
  • Falsification criterion written ("if wrong, I'd expect to see ___")
  • Falsification test run BEFORE confirmation test
  • Result recorded (ELIMINATED with reason, or CONFIRMED with evidence)
  • Hypothesis re-evaluated at this tool-call boundary — new evidence changes what to check next. Interleaved thinking makes this automatic for Claude 4; consciously invoke it for other models.
  • All traces/instrumentation removed before next hypothesis

Two Orientations

Understand (Grounded Theory)

Goal: Build a mental model from the code itself, not assumptions.

  1. Open coding — Read code, name what you see (functions, patterns, flows)
  2. Constant comparison — Compare new observations against earlier ones
  3. Axial coding — Connect the categories (what calls what, data flows)
  4. Memo — Write findings to session memory as you go
  5. Saturation check — Stop when new files confirm what you already know

Use for: "How does X work?", "What's the architecture?", "I need to understand this before changing it."

Diagnose (Strong Inference + Satisficing)

Goal: Determine why something isn't working.

Simple check first: Can you answer this with a single log/print? If the question is "what value does X have here?" — just log and look.

Triage (if the simple check didn't resolve it):

Factor Low Risk High Risk
Reversibility Easy to undo Hard to reverse (data, deploy)
Blast radius One file/function Many systems, shared state
Confidence Familiar, clear evidence Novel, ambiguous symptoms
Novelty Seen this before Never encountered
Time cost Known fast (<5s) Unknown = measure first

Low risk → Satisfice: Test the single most likely hypothesis. Done if confirmed.

Any high risk → Strong Inference: Generate 2-3 competing hypotheses, design a discriminating test, eliminate based on evidence.

Mode Switching

These compose recursively: Understand → anomaly → Diagnose → need context → Understand → ...

Circuit Breakers

  1. 5+ attempts without falsifying = STOP and report
  2. 3+ edits to same file without passing test = STOP and rethink
  3. Urge to "just try something" = STOP and write hypothesis first
  4. Two failures at same abstraction level = go UP one level

Context Management

Methodology degrades after ~15 tool calls (context competition). Counteract:

  • Re-read investigation file and dead-ends every ~10 tool calls
  • If drifting toward guess-and-check, pause and re-read notes
  • For long sessions, create an investigation file so fresh context can continue
  • Hold references; load on demand. Do not read files you don't need yet.

Dead-Ends Format

Record eliminated hypotheses so you (or the next session) don't re-test them:

- **[timestamp] Hypothesis:** [one sentence]
  **Falsification:** [what you'd expect if wrong]
  **Result:** [ELIMINATED/CONFIRMED] — [why, in one sentence]

Write to .session/dead-ends.md or the investigation file's Hypotheses section.

Timing Awareness

  • Prefix unknown commands with time to learn baselines
  • Capture output: time npm test 2>&1 | tee /tmp/test_output.txt
  • Fast (<5s): low barrier to run. Slow (>30s): reason first. Unknown: measure.

Techniques

  • Five Whys: Trace causal chains. Starting point, not sole method.
  • Delta Debugging: Binary search between passing/failing cases (git bisect logic).
  • Rubber Duck: Explain the system step by step in writing to expose gaps.

Full Agent

For comprehensive investigation support with delegation, exploration files, and session memory management, use @research.