--- description: 'Load the structured research methodology — call this when starting any investigation, debugging session, root cause analysis, or systematic exploration of unfamiliar code. Returns a checklist with two orientations (Understand + Diagnose), risk-based triage, circuit breakers, and context management guidance.' toolName: 'load_research_methodology' --- # Research Methodology Skill This skill provides a structured, evidence-based investigation methodology. It prevents common AI agent failure modes: pattern-matching without evidence, confirmation bias, fixing symptoms instead of causes, and methodology drift during long sessions. ## Quick Reference: The Investigation Checklist Before every hypothesis cycle: - [ ] **Hypothesis written** (one sentence: "I believe X because Y") - [ ] **Falsification criterion written** ("if wrong, I'd expect to see \_\_\_") - [ ] **Falsification test run BEFORE confirmation test** - [ ] **Result recorded** (ELIMINATED with reason, or CONFIRMED with evidence) - [ ] **Hypothesis re-evaluated at this tool-call boundary** — new evidence changes what to check next. Interleaved thinking makes this automatic for Claude 4; consciously invoke it for other models. - [ ] **All traces/instrumentation removed** before next hypothesis ## Two Orientations ### Understand (Grounded Theory) **Goal**: Build a mental model from the code itself, not assumptions. 1. **Open coding** — Read code, name what you see (functions, patterns, flows) 2. **Constant comparison** — Compare new observations against earlier ones 3. **Axial coding** — Connect the categories (what calls what, data flows) 4. **Memo** — Write findings to session memory as you go 5. **Saturation check** — Stop when new files confirm what you already know **Use for**: "How does X work?", "What's the architecture?", "I need to understand this before changing it." ### Diagnose (Strong Inference + Satisficing) **Goal**: Determine why something isn't working. **Simple check first**: Can you answer this with a single log/print? If the question is "what value does X have here?" — just log and look. **Triage** (if the simple check didn't resolve it): | Factor | Low Risk | High Risk | | ----------------- | ------------------------ | ------------------------------ | | **Reversibility** | Easy to undo | Hard to reverse (data, deploy) | | **Blast radius** | One file/function | Many systems, shared state | | **Confidence** | Familiar, clear evidence | Novel, ambiguous symptoms | | **Novelty** | Seen this before | Never encountered | | **Time cost** | Known fast (<5s) | Unknown = measure first | **Low risk → Satisfice**: Test the single most likely hypothesis. Done if confirmed. **Any high risk → Strong Inference**: Generate 2-3 competing hypotheses, design a discriminating test, eliminate based on evidence. ### Mode Switching These compose recursively: `Understand → anomaly → Diagnose → need context → Understand → ...` ## Circuit Breakers 1. **5+ attempts without falsifying = STOP and report** 2. **3+ edits to same file without passing test = STOP and rethink** 3. **Urge to "just try something" = STOP and write hypothesis first** 4. **Two failures at same abstraction level = go UP one level** ## Context Management Methodology degrades after ~15 tool calls (context competition). Counteract: - Re-read investigation file and dead-ends every ~10 tool calls - If drifting toward guess-and-check, pause and re-read notes - For long sessions, create an investigation file so fresh context can continue - Hold references; load on demand. Do not read files you don't need yet. ## Dead-Ends Format Record eliminated hypotheses so you (or the next session) don't re-test them: ``` - **[timestamp] Hypothesis:** [one sentence] **Falsification:** [what you'd expect if wrong] **Result:** [ELIMINATED/CONFIRMED] — [why, in one sentence] ``` Write to `.session/dead-ends.md` or the investigation file's Hypotheses section. ## Timing Awareness - Prefix unknown commands with `time` to learn baselines - Capture output: `time npm test 2>&1 | tee /tmp/test_output.txt` - Fast (<5s): low barrier to run. Slow (>30s): reason first. Unknown: measure. ## Techniques - **Five Whys**: Trace causal chains. Starting point, not sole method. - **Delta Debugging**: Binary search between passing/failing cases (`git bisect` logic). - **Rubber Duck**: Explain the system step by step in writing to expose gaps. ## Full Agent For comprehensive investigation support with delegation, exploration files, and session memory management, use `@research`.