Lesson 057: Test-Gated Commits at Scale

Lesson 057: Test-Gated Commits at Scale

The Lesson

Gate every commit on a passing test suite, not on "the feature looks done." With 1,500+ tests across a project, the suite catches regressions that visual inspection misses — wrong column names, broken imports, type mismatches, off-by-one errors. The test suite is the contract for "this commit is safe," and the discipline of never committing red tests prevents cascading failures across phases.

Context

A multi-project development workflow maintained test suites ranging from 131 tests (data validation) to 1,500+ tests (RPG bot) to 5,101 assertions (render equivalence). The developer's rule: never commit unless all tests pass. This discipline was tested repeatedly — buggy generated code was the single largest friction source (59 incidents across 64 sessions), but gating commits on green tests prevented most bugs from persisting past the commit that introduced them.

What Happened

  1. The most common bug pattern: generated code used wrong column names, incorrect regex, or type mismatches. These bugs were syntactically valid — no import errors, no crashes — but produced wrong results. Without tests, they would have been committed and discovered much later.

  2. The commit workflow: implement the change → run pytest → if red, fix and re-run → when green, commit. This added 1-3 minutes per commit (test suite runtime) but saved hours of debugging later. A bug caught at commit time takes seconds to fix; the same bug found 5 commits later requires git bisect and careful untangling.

  3. Equivalence testing was particularly valuable during migrations. When converting XML exam data to JSON, 5,101 individual assertions verified that every question, answer, hint, and metadata field survived the conversion identically. This caught encoding issues (HTML entities, whitespace normalization) that would have been invisible to manual inspection.

  4. The test suite grew alongside the codebase: 168 tests at Phase S3-S4, 259 tests after adding voting blocks and acceptance tests. Each new module added its own test file. The discipline was: if you add a module, you add tests for it. No exceptions.

  5. One key insight: running tests before making changes (baseline) and after each file change (incremental) caught regressions immediately. The alternative — making many changes then running tests — produced compound failures where multiple bugs interacted and root causes were ambiguous.

Key Insights

Applicability

Test-gated commits work for any project with:

Does NOT apply when:

Related Lessons