- AGENTS.md: design principles, enforcement hierarchy, deferred loading - agents/: brainstorm, build, orchestrator, research (auto-discovered by MCP server) - skills/: research methodology (auto-discovered by MCP server) - hooks/: pre-tool-use, post-tool-use (BFF block removed), session-start, stop, pre-compact, user-prompt-submit - frameworks/: opencode/plugin.ts (resolves hooks via import.meta.url — works as project-local or global plugin), github/hooks.json - mcp/index.ts: auto-discovers agents/*.md and skills/*.md from frontmatter (replaces hand-maintained registry); server renamed all-agents - docs/: agent-infrastructure.md (generalized), research docs (7 files), ai_architectures.md, llama-server-cuda-wsl2.md - install.sh: idempotent setup — Copilot global hooks, OpenCode global plugin + AGENTS.md + MCP entry, VS Code global MCP config
114 lines
4.7 KiB
Markdown
114 lines
4.7 KiB
Markdown
---
|
|
description: 'Load the structured research methodology — call this when starting any investigation, debugging session, root cause analysis, or systematic exploration of unfamiliar code. Returns a checklist with two orientations (Understand + Diagnose), risk-based triage, circuit breakers, and context management guidance.'
|
|
toolName: 'load_research_methodology'
|
|
---
|
|
|
|
# Research Methodology Skill
|
|
|
|
This skill provides a structured, evidence-based investigation methodology. It
|
|
prevents common AI agent failure modes: pattern-matching without evidence,
|
|
confirmation bias, fixing symptoms instead of causes, and methodology drift
|
|
during long sessions.
|
|
|
|
## Quick Reference: The Investigation Checklist
|
|
|
|
Before every hypothesis cycle:
|
|
|
|
- [ ] **Hypothesis written** (one sentence: "I believe X because Y")
|
|
- [ ] **Falsification criterion written** ("if wrong, I'd expect to see \_\_\_")
|
|
- [ ] **Falsification test run BEFORE confirmation test**
|
|
- [ ] **Result recorded** (ELIMINATED with reason, or CONFIRMED with evidence)
|
|
- [ ] **Hypothesis re-evaluated at this tool-call boundary** — new evidence
|
|
changes what to check next. Interleaved thinking makes this automatic for
|
|
Claude 4; consciously invoke it for other models.
|
|
- [ ] **All traces/instrumentation removed** before next hypothesis
|
|
|
|
## Two Orientations
|
|
|
|
### Understand (Grounded Theory)
|
|
|
|
**Goal**: Build a mental model from the code itself, not assumptions.
|
|
|
|
1. **Open coding** — Read code, name what you see (functions, patterns, flows)
|
|
2. **Constant comparison** — Compare new observations against earlier ones
|
|
3. **Axial coding** — Connect the categories (what calls what, data flows)
|
|
4. **Memo** — Write findings to session memory as you go
|
|
5. **Saturation check** — Stop when new files confirm what you already know
|
|
|
|
**Use for**: "How does X work?", "What's the architecture?", "I need to
|
|
understand this before changing it."
|
|
|
|
### Diagnose (Strong Inference + Satisficing)
|
|
|
|
**Goal**: Determine why something isn't working.
|
|
|
|
**Simple check first**: Can you answer this with a single log/print? If the
|
|
question is "what value does X have here?" — just log and look.
|
|
|
|
**Triage** (if the simple check didn't resolve it):
|
|
|
|
| Factor | Low Risk | High Risk |
|
|
| ----------------- | ------------------------ | ------------------------------ |
|
|
| **Reversibility** | Easy to undo | Hard to reverse (data, deploy) |
|
|
| **Blast radius** | One file/function | Many systems, shared state |
|
|
| **Confidence** | Familiar, clear evidence | Novel, ambiguous symptoms |
|
|
| **Novelty** | Seen this before | Never encountered |
|
|
| **Time cost** | Known fast (<5s) | Unknown = measure first |
|
|
|
|
**Low risk → Satisfice**: Test the single most likely hypothesis. Done if
|
|
confirmed.
|
|
|
|
**Any high risk → Strong Inference**: Generate 2-3 competing hypotheses, design
|
|
a discriminating test, eliminate based on evidence.
|
|
|
|
### Mode Switching
|
|
|
|
These compose recursively:
|
|
`Understand → anomaly → Diagnose → need context → Understand → ...`
|
|
|
|
## Circuit Breakers
|
|
|
|
1. **5+ attempts without falsifying = STOP and report**
|
|
2. **3+ edits to same file without passing test = STOP and rethink**
|
|
3. **Urge to "just try something" = STOP and write hypothesis first**
|
|
4. **Two failures at same abstraction level = go UP one level**
|
|
|
|
## Context Management
|
|
|
|
Methodology degrades after ~15 tool calls (context competition). Counteract:
|
|
|
|
- Re-read investigation file and dead-ends every ~10 tool calls
|
|
- If drifting toward guess-and-check, pause and re-read notes
|
|
- For long sessions, create an investigation file so fresh context can continue
|
|
- Hold references; load on demand. Do not read files you don't need yet.
|
|
|
|
## Dead-Ends Format
|
|
|
|
Record eliminated hypotheses so you (or the next session) don't re-test them:
|
|
|
|
```
|
|
- **[timestamp] Hypothesis:** [one sentence]
|
|
**Falsification:** [what you'd expect if wrong]
|
|
**Result:** [ELIMINATED/CONFIRMED] — [why, in one sentence]
|
|
```
|
|
|
|
Write to `.session/dead-ends.md` or the investigation file's Hypotheses section.
|
|
|
|
## Timing Awareness
|
|
|
|
- Prefix unknown commands with `time` to learn baselines
|
|
- Capture output: `time npm test 2>&1 | tee /tmp/test_output.txt`
|
|
- Fast (<5s): low barrier to run. Slow (>30s): reason first. Unknown: measure.
|
|
|
|
## Techniques
|
|
|
|
- **Five Whys**: Trace causal chains. Starting point, not sole method.
|
|
- **Delta Debugging**: Binary search between passing/failing cases (`git bisect`
|
|
logic).
|
|
- **Rubber Duck**: Explain the system step by step in writing to expose gaps.
|
|
|
|
## Full Agent
|
|
|
|
For comprehensive investigation support with delegation, exploration files, and
|
|
session memory management, use `@research`.
|