Block 6: Vision, Curation & AI-Assisted Workflow Lessons

Block 6: Vision, Curation & AI-Assisted Workflow Lessons

19 lessons from biased voting blocks, vision tagging, CLIP classification, image deduplication, interactive selection tooling, and AI-assisted development patterns.

# Lesson Core Teaching
039 Mock Tagger for Vision Pipeline Testing Hash-based mock tagger provides deterministic, self-scaling attribute generation for CI without GPU
040 Controlled Vocabulary as Schema Contract A single YAML vocabulary enforces attribute consistency across VLM prompt, parser, DB, config, and analysis
041 Utility Function for Synthetic Voting Bias Additive utility (base appeal + preference x match + randomness x noise) gives orthogonal control over bias strength
042 Lift as the Primary Bias Detection Metric Lift (block rate / global rate) is scale-invariant and base-rate-invariant for bias detection
043 PII Sanitization in Static Exports Defense in depth — query exclusion, recursive field stripping, and post-export content scanning
044 Acceptance Tests as Executable Specifications Assert structural properties for statistical outputs and exact values for deterministic outputs
045 Embedding Deduplication for Image Curation CLIP cosine similarity + connected components reduced 12,217 images to 2,163 unique representatives
046 Lazy Imports for Deployment Compatibility Move heavy dependencies (numpy, torch) to function scope so lightweight deployments don't break
047 CLIP Zero-Shot as a Database Column Factory One CLIP model + descriptive prompts = arbitrarily many structured database columns, no training needed
048 Greedy Max-Min Diversity Selection Iteratively pick the most distant item from selected set — O(n*k), near-optimal, 15 lines of code
049 Drag-and-Drop as Simplest Viable Interaction Co-visible source pool + target slots with HTML5 DnD beats modals, search, and multi-step wizards
050 Connected Components for Transitive Dedup scipy sparse graph components correctly group transitive chains that pair-based merging splits
051 Sigmoid Calibration for Domain-Specific CLIP CLIP logits are domain-specific (16-32 for space photos); sigmoid center and scale must be empirically tuned
052 Incremental Feature Extraction Delete-and-rewrite only new attribute codes; label_source and attribute_code columns enable surgical updates
053 Audit-First Design A 15-minute written audit of existing code prevents hours of reimplementation and identifies extension points
054 Phased Autonomous Execution Numbered phases with commit checkpoints turn multi-session projects into a queue of self-contained tasks
055 Session Continuity via Artifacts CLAUDE.md + startup docs + plan files eliminate ramp-up overhead across session boundaries
056 Environment Self-Interference AI assistants fight their own orphan processes — explicit cleanup before each launch is mandatory
057 Test-Gated Commits at Scale Never commit red tests; run incrementally after each change, not in batch at the end