dotfiles/.agents/skills/research.md

---
description: 'Load the structured research methodology — call this when starting any investigation, debugging session, root cause analysis, or systematic exploration of unfamiliar code. Returns a checklist with two orientations (Understand + Diagnose), risk-based triage, circuit breakers, and context management guidance.'
toolName: 'load_research_methodology'
---

# Research Methodology Skill

This skill provides a structured, evidence-based investigation methodology. It
prevents common AI agent failure modes: pattern-matching without evidence,
confirmation bias, fixing symptoms instead of causes, and methodology drift
during long sessions.

## Quick Reference: The Investigation Checklist

Before every hypothesis cycle:

- [ ] **Hypothesis written** (one sentence: "I believe X because Y")
- [ ] **Falsification criterion written** ("if wrong, I'd expect to see \_\_\_")
- [ ] **Falsification test run BEFORE confirmation test**
- [ ] **Result recorded** (ELIMINATED with reason, or CONFIRMED with evidence)
- [ ] **Hypothesis re-evaluated at this tool-call boundary** — new evidence
      changes what to check next. Interleaved thinking makes this automatic for
      Claude 4; consciously invoke it for other models.
- [ ] **All traces/instrumentation removed** before next hypothesis

## Two Orientations

### Understand (Grounded Theory)

**Goal**: Build a mental model from the code itself, not assumptions.

1. **Open coding** — Read code, name what you see (functions, patterns, flows)
2. **Constant comparison** — Compare new observations against earlier ones
3. **Axial coding** — Connect the categories (what calls what, data flows)
4. **Memo** — Write findings to session memory as you go
5. **Saturation check** — Stop when new files confirm what you already know

**Use for**: "How does X work?", "What's the architecture?", "I need to
understand this before changing it."

### Diagnose (Strong Inference + Satisficing)

**Goal**: Determine why something isn't working.

**Simple check first**: Can you answer this with a single log/print? If the
question is "what value does X have here?" — just log and look.

**Triage** (if the simple check didn't resolve it):

| Factor            | Low Risk                 | High Risk                      |
| ----------------- | ------------------------ | ------------------------------ |
| **Reversibility** | Easy to undo             | Hard to reverse (data, deploy) |
| **Blast radius**  | One file/function        | Many systems, shared state     |
| **Confidence**    | Familiar, clear evidence | Novel, ambiguous symptoms      |
| **Novelty**       | Seen this before         | Never encountered              |
| **Time cost**     | Known fast (<5s)         | Unknown = measure first        |

**Low risk → Satisfice**: Test the single most likely hypothesis. Done if
confirmed.

**Any high risk → Strong Inference**: Generate 2-3 competing hypotheses, design
a discriminating test, eliminate based on evidence.

### Mode Switching

These compose recursively:
`Understand → anomaly → Diagnose → need context → Understand → ...`

## Circuit Breakers

1. **5+ attempts without falsifying = STOP and report**
2. **3+ edits to same file without passing test = STOP and rethink**
3. **Urge to "just try something" = STOP and write hypothesis first**
4. **Two failures at same abstraction level = go UP one level**

## Context Management

Methodology degrades after ~15 tool calls (context competition). Counteract:

- Re-read investigation file and dead-ends every ~10 tool calls
- If drifting toward guess-and-check, pause and re-read notes
- For long sessions, create an investigation file so fresh context can continue
- Hold references; load on demand. Do not read files you don't need yet.

## Dead-Ends Format

Record eliminated hypotheses so you (or the next session) don't re-test them:

```
- **[timestamp] Hypothesis:** [one sentence]
  **Falsification:** [what you'd expect if wrong]
  **Result:** [ELIMINATED/CONFIRMED] — [why, in one sentence]
```

Write to `.session/dead-ends.md` or the investigation file's Hypotheses section.

## Timing Awareness

- Prefix unknown commands with `time` to learn baselines
- Capture output: `time npm test 2>&1 | tee /tmp/test_output.txt`
- Fast (<5s): low barrier to run. Slow (>30s): reason first. Unknown: measure.

## Techniques

- **Five Whys**: Trace causal chains. Starting point, not sole method.
- **Delta Debugging**: Binary search between passing/failing cases (`git bisect`
  logic).
- **Rubber Duck**: Explain the system step by step in writing to expose gaps.

## Full Agent

For comprehensive investigation support with delegation, exploration files, and
session memory management, use `@research`.