DuckDB's executemany with parameterized INSERT statements can hang indefinitely at scale (10K+ rows). Replacing it with ...
When a database migration creates a table that new code writes to, the migration must be applied before the code runs — ...
When an embedded database (DuckDB, SQLite) serves both a batch pipeline and an interactive web app, the web layer should...
When a CLI pipeline and a web API need the same data, import the query functions directly rather than duplicating SQL. A...
The vision tagging pipeline uses Qwen2.5-VL (a 7B-parameter vision-language model) to classify image attributes. Running...
Multiple pipeline stages in Artemis started with per-row INSERT or UPDATE patterns that worked fine during development (...
The original thumbnail downloader processed 12,217 images sequentially. Each download created a new httpx.Client instanc...
When investigating why multimodal clustering produced zero results, the breakthrough came from a simple query: sql SELEC...
While developing the concurrent thumbnail downloader, the DuckDB warehouse file (warehouse.duckdb) became locked by the ...
Python scripts in the Artemis project span multiple roles: XML/JSON migration, schema validation, metadata enrichment, t...
The thumbnail download process was killed multiple times during development — once to change the rate limit, once to adj...
Future lessons identified from the Claude Code usage insights analysis (64 sessions, 150 commits, May 2026). These are p...
Migrating an existing multi-page site to a design system is a page-by-page operation, not a big-bang rewrite. The design...
When migrating a data format (XML to JSON) that feeds a rendering pipeline, the only way to prove the migration is corre...
When migrating a live data format (XML to JSON), the key risk is not the conversion itself — it's proving that the new f...
Government data sources contain artifacts of their internal production processes — temp files in archives, renamed colum...
Data pipelines fail — downloads timeout, parsers hit unexpected formats, database connections drop. Idempotency (running...
Once a warehouse holds multiple vintages of the same dataset, every query must explicitly decide whether it wants the la...
When building an analytical warehouse from multiple federal data products, the single most important architectural decis...
When loading multiple vintages of the same dataset, dimension tables must deduplicate on business key alone — not on bus...
When a web framework dispatches synchronous endpoint handlers to a thread pool, a shared database connection will produc...