Block 6: Vision, Curation & AI-Assisted Workflow Lessons
Block 6: Vision, Curation & AI-Assisted Workflow Lessons
19 lessons from biased voting blocks, vision tagging, CLIP classification, image deduplication, interactive selection tooling, and AI-assisted development patterns.
| # | Lesson | Core Teaching |
|---|---|---|
| 039 | Mock Tagger for Vision Pipeline Testing | Hash-based mock tagger provides deterministic, self-scaling attribute generation for CI without GPU |
| 040 | Controlled Vocabulary as Schema Contract | A single YAML vocabulary enforces attribute consistency across VLM prompt, parser, DB, config, and analysis |
| 041 | Utility Function for Synthetic Voting Bias | Additive utility (base appeal + preference x match + randomness x noise) gives orthogonal control over bias strength |
| 042 | Lift as the Primary Bias Detection Metric | Lift (block rate / global rate) is scale-invariant and base-rate-invariant for bias detection |
| 043 | PII Sanitization in Static Exports | Defense in depth — query exclusion, recursive field stripping, and post-export content scanning |
| 044 | Acceptance Tests as Executable Specifications | Assert structural properties for statistical outputs and exact values for deterministic outputs |
| 045 | Embedding Deduplication for Image Curation | CLIP cosine similarity + connected components reduced 12,217 images to 2,163 unique representatives |
| 046 | Lazy Imports for Deployment Compatibility | Move heavy dependencies (numpy, torch) to function scope so lightweight deployments don't break |
| 047 | CLIP Zero-Shot as a Database Column Factory | One CLIP model + descriptive prompts = arbitrarily many structured database columns, no training needed |
| 048 | Greedy Max-Min Diversity Selection | Iteratively pick the most distant item from selected set — O(n*k), near-optimal, 15 lines of code |
| 049 | Drag-and-Drop as Simplest Viable Interaction | Co-visible source pool + target slots with HTML5 DnD beats modals, search, and multi-step wizards |
| 050 | Connected Components for Transitive Dedup | scipy sparse graph components correctly group transitive chains that pair-based merging splits |
| 051 | Sigmoid Calibration for Domain-Specific CLIP | CLIP logits are domain-specific (16-32 for space photos); sigmoid center and scale must be empirically tuned |
| 052 | Incremental Feature Extraction | Delete-and-rewrite only new attribute codes; label_source and attribute_code columns enable surgical updates |
| 053 | Audit-First Design | A 15-minute written audit of existing code prevents hours of reimplementation and identifies extension points |
| 054 | Phased Autonomous Execution | Numbered phases with commit checkpoints turn multi-session projects into a queue of self-contained tasks |
| 055 | Session Continuity via Artifacts | CLAUDE.md + startup docs + plan files eliminate ramp-up overhead across session boundaries |
| 056 | Environment Self-Interference | AI assistants fight their own orphan processes — explicit cleanup before each launch is mandatory |
| 057 | Test-Gated Commits at Scale | Never commit red tests; run incrementally after each change, not in batch at the end |