Bulk Metadata Enrichment Scripts

Bulk Metadata Enrichment Scripts

The Lesson

When hundreds of data records need the same type of update (adding titles, categories, tags, or enriched descriptions), writing a dedicated Python script that reads a manifest and patches the data files is orders of magnitude faster and more reliable than manual editing. The script is disposable, but the pattern is reusable.

Context

The certification quiz had 10 new exam files (7 files in one batch) where each question needed: a title, a scenario, category references, difficulty level, and tags. With 50 questions per file, that's 350 questions × 5 fields = 1,750 individual edits. Manual editing was impractical.

What Happened

  1. A patch_metadata.py script was written to read a JSON manifest specifying per-question metadata
  2. Manifest files were created for each exam (e.g., scripts/manifests/aip-c01.json) with question IDs and their metadata
  3. The script parsed each XML file, located questions by ID, and inserted/updated the specified metadata elements
  4. The process was applied to all 10 files across 3 commits, each covering a batch of 2-3 exams
  5. Post-patch validation confirmed all files remained valid XML and loaded correctly

Key Insights

Related Lessons