Repo Doctor — the AI prompt that health-checks your whole codebase and tells you what to fix first

A structured prompt that turns any coding agent into a repository auditor — severity-tagged findings, category scores, and a prioritized remediation plan instead of a vague "looks good."

Ask an AI agent "review my codebase" and you get a polite tour: a few compliments, a couple of nitpicks, no priorities. The problem isn't the model — it's that "review" is an unbounded task with no output contract. Give the same model a structure — categories, severity levels, scores, and a required remediation order — and it becomes a genuinely useful auditor.

This is the Repo Doctor prompt. Run it before handing a repo to someone else, after a long stretch of vibe coding, or whenever you suspect the codebase has drifted from "scrappy" to "scary."

The prompt

Act as a repository health auditor. Examine this codebase and produce a
structured health report. Do not summarize what the project does — audit it.

Evaluate these 8 categories. For each: a score out of 10, and specific
findings with file paths.

1. SECURITY — secrets in code/config/history, unvalidated input reaching
   SQL/shell/templates, missing auth on endpoints, public pages leaking
   unpublished data, dependency risks.
2. CORRECTNESS — swallowed errors, race conditions, unhandled promise
   rejections, dead code paths, TODO-stubs shipping as features.
3. TESTS — do they exist, do they test anything real, what critical paths
   have zero coverage.
4. BUILD & DEPLOY HYGIENE — does the build pass, type errors tolerated in
   dev that will fail CI, config that diverges between local and prod.
5. DEPENDENCIES — unused, duplicated, unpinned, abandoned, or shadowing
   built-in functionality.
6. CODE STRUCTURE — premature abstraction, copy-paste families,
   god-files, circular imports.
7. DOCS & ONBOARDING — can a stranger (or an agent) run this repo from
   the README alone?
8. CONSISTENCY — competing patterns for the same job (two ways to fetch
   data, three ways to handle errors).

Severity-tag every finding: [CRITICAL] exploitable or data-losing,
[HIGH] will bite within weeks, [MEDIUM] friction, [LOW] cosmetic.

Then produce:
- Overall health score /10 with one-line justification
- THE PLAN: findings ordered by (severity x effort), as a checklist.
  Quick critical fixes first. For each: what, where, why, rough effort.
- Three things the repo does WELL (so they don't get refactored away).

Rules: every finding needs a file path or it doesn't count. No generic
advice ("add more tests") without naming what specifically to test.
Read files before judging them.

Why the structure works

Severity tags force triage. Without them, a hardcoded API key and an inconsistent import style get the same bullet-point weight. With them, the model has to commit — and you can ignore everything below HIGH on a busy week.

"Every finding needs a file path" is the anti-hallucination clause. It converts vibes ("error handling could be improved") into checkable claims ("src/app/api/submit/route.ts catches and discards the Supabase error"). Findings without paths are the model guessing; the rule makes guessing visible.

The "three things done well" section isn't politeness. It's protection. Agents (and humans) doing cleanup passes love refactoring the weird-looking thing that was actually load-bearing. Naming what works builds a do-not-touch list.

Category scores make the audit comparable. Run Repo Doctor monthly and the scores become a trend line. A security score that drops from 8 to 6 after a feature sprint is a conversation worth having with yourself.

Reading the results

A few patterns from running this across many repos:

The first CRITICAL is usually real. Models are reliable at spotting secrets in code, missing auth checks, and queries without published-status filters. Verify it, then fix it the same day.
Test findings need the most skepticism. "No tests for X" is easy for a model to claim and sometimes wrong — check before acting.
The consistency category is the sleeper. Two competing data-fetching patterns won't crash anything, but they double the cost of every future change, and they confuse the next AI agent you point at the repo — it'll pick one pattern at random and entrench the split.

When to run it

Before any handoff (human or agent), after any multi-week feature push, and as a recurring scheduled task if your tooling supports it. The audit is cheap; the drift it catches is not. A codebase that gets doctored monthly stays boring — and boring codebases ship.

Back to Blog