Lesson 027: Migration Ordering and Apply-on-Use Gaps

Lesson 027: Migration Ordering and Apply-on-Use Gaps

The Lesson

When a database migration creates a table that new code writes to, the migration must be applied before the code runs — not just before the next CLI invocation. If the code path that triggers the write doesn't call apply_migrations(), the table won't exist at runtime, even though the migration file is on disk and other commands apply it correctly.

Context

A DuckDB-based data warehouse used numbered SQL migration files (001–007) applied by an apply_migrations() function that reads the _migrations table, finds unapplied files, and executes them in order. Each CLI command called apply_migrations(conn) at startup. Migration 006 created two scoring mart tables. Migration 007 created two calendar optimization tables. Both migrations existed on disk but had to be applied via code.

What Happened

  1. Added migration 006 (006_create_scoring_tables.sql) to create mart_image_preference_score and mart_inter_rater_reliability. The CLI's compute-scores command called apply_migrations(conn) at the top, so the tables would be created on first run.
  2. Ran compute-scores — it completed silently with no output. Queried the table — Table with name mart_image_preference_score does not exist. The migration hadn't been applied.
  3. Investigated: the _migrations table showed versions 1–5 applied, but not 6. The migrate command also showed "no pending migrations" — which was wrong.
  4. Root cause: the migrate CLI command worked correctly, but compute-scores had a separate connection path that didn't properly trigger migration 006. Running migrate explicitly, then compute-scores, worked.
  5. The same pattern recurred with migration 007 for calendar tables. The optimize command called apply_migrations(conn) but the migration still wasn't applied on the first run.
  6. The fix was to ensure apply_migrations() was called and its result verified before any write operation that depends on new tables. Additionally, running migrate as a standalone step before any new pipeline stage became standard practice.

Key Insights

Applicability

This applies to any system with schema migrations (Django, Alembic, Flyway, Knex, custom):

The pattern is especially treacherous in embedded databases (SQLite, DuckDB) where there's no separate database server with its own migration lifecycle — migrations only run when application code triggers them.

Related Lessons