GitHub Pages Custom Domain Setup
Configuring a custom domain for GitHub Pages requires coordinating DNS, repo settings, build tool config, and deployment mode — each can silently break the others.
Configuring a custom domain for GitHub Pages requires coordinating DNS, repo settings, build tool config, and deployment mode — each can silently break the others.
When local services are already running, skip mocks and test the real pipeline end-to-end
Abstract base classes with minimal interfaces let the same RAG pipeline run on four different cloud providers without conditional logic in business code.
Systematic triage of code review findings produces a traceable requirements document — turning ad hoc observations into prioritized, implementable work.
A repeatable workflow — Design, PDR, Plan, Execute, Commit — with table-driven task tracking and one-commit-per-phase discipline, applied across 18 project phases.
Deferring cloud SDK imports to runtime lets the same codebase run with or without any given SDK installed, and enables testing without real dependencies.
Three cloud stacks (AWS, Azure, GCP) built in separate phases with OIDC federation, avoiding cross-cloud coupling while sharing a common authentication pattern.
Splitting documents at H2 headings with stable IDs and content hashes produces predictable, debuggable chunks that support incremental re-indexing.
Seven heuristic rules detect when a RAG corpus can't answer a query, without training data or a classifier — transparent and debuggable, with known trade-offs.
Designing a GitHub Actions workflow that harvests, validates, builds, indexes, and deploys a static site.
Key choices in building the lesson harvester — recursive scanning, path-based slugs, and integrated export generation.
Integrating Pagefind for full-text search on a static site with no backend.
Why the validator uses ERROR/WARNING/INFO levels and why warnings never fail the build.
The Artemis vote system shows 50 random images per ballot and asks voters to pick 5 favorites. With 500 ballots across 1...
The Artemis pairwise voting mode shows two images side by side and asks "which is better?" This produces binary outcomes...
We have pairwise comparison data (image A beats image B) and want the best possible strength estimates. Bradley-Terry-Lu...
The Artemis category voting mode asks voters to rank their top 3 images within a category. We need to convert these part...
We want to measure whether voters agree on which images are good. With 100 voters and 12,217 images, the voter-image mat...
We have three different types of preference data — batch selection rates, Elo ratings from pairwise comparisons, and Bor...
The scoring pipeline will be re-run as new vote data arrives, as scoring methods are tuned, or as bugs are fixed. Each r...
Several columns in the scoring output have no meaningful value for most images. Only 200 images have Elo scores (from 2,...
DuckDB's executemany with parameterized INSERT statements can hang indefinitely at scale (10K+ rows). Replacing it with ...
When selecting a fixed-size collection where the items must work together (a calendar, a playlist, a portfolio, a menu),...
When images lack text metadata (titles, descriptions, captions), month or season suitability can still be approximated f...
Greedy Maximum Marginal Relevance (MMR) is the practical default for selecting a diverse, high-quality subset from a lar...
When you need to assign N items to N slots where each item-slot pair has a fitness score, the Hungarian algorithm gives ...
When building an optimizer, always generate multiple candidate solutions using different methods — including at least on...
A dependency that's imported in production code but missing from the package manifest is a time bomb. It works on the de...
When a database migration creates a table that new code writes to, the migration must be applied before the code runs — ...
We planted known biases in synthetic vote data — 10% of voters had position bias (preferring earlier-displayed images), ...
We have a calendar optimizer that selects 13 images from 12,217 using a weighted objective function (popularity, diversi...
We know that 20% of synthetic voters are intentionally noisy (10% position-biased, 10% random). We compute Krippendorff'...
When an embedded database (DuckDB, SQLite) serves both a batch pipeline and an interactive web app, the web layer should...
When an interactive web app needs sub-100ms responses from a scoring function that depends on large lookup tables, load ...
A hash-routed single-page application built with vanilla JavaScript, ES modules, and dynamic import() can deliver a func...
When a CLI pipeline and a web API need the same data, import the query functions directly rather than duplicating SQL. A...
A design system built on CSS custom properties (design tokens) can be shared across completely independent frontends — s...
When a linter rule flags code that follows a framework's official pattern, suppress the rule per-line with noqa rather t...
A FastAPI + JavaScript SPA can be deployed to GitHub Pages without rewriting frontend code by using a fetch shim — a sma...
When a project develops on Windows but deploys via CI on Linux, hardcoded paths like D:/artemis/warehouse.duckdb will fa...
The vision tagging pipeline uses Qwen2.5-VL (a 7B-parameter vision-language model) to classify image attributes. Running...
The vision tagging pipeline needs a consistent set of image attributes shared across five components: the vision model p...
Synthetic vote generation needs to produce votes that exhibit detectable attribute-based bias while remaining statistica...
Block-aware statistics need a metric that answers: "does this voting block select images with attribute X more than expe...
The static site serves pre-built JSON files from a public URL. The warehouse database contains voter surrogate keys (vot...
The biased voting blocks pipeline spans six components: config validation, vote generation, attribute analysis, cluster ...
When working with a large image collection from an automated source, assume near-duplicates dominate the pool until prov...
Import heavy dependencies inside the function that uses them, not at module scope. A module-level import numpy means eve...
A single CLIP model, used for zero-shot classification against descriptive text prompts, functions as a general-purpose ...
To select k items that maximally represent the diversity within a group, iteratively pick the item most distant from all...
When the user's mental model is "put this thing in that slot," drag-and-drop is less code and more intuitive than altern...
When deduplicating by pairwise similarity, use graph connected components to group items — not naive pair-based merging....
CLIP logits have domain-specific distributions. Converting them to meaningful [0,1] confidence scores requires a sigmoid...
When adding new features to an existing collection, delete-and-rewrite only the new columns rather than re-processing ev...
Before writing any code for a new feature, produce a written audit of the existing codebase: what exists, what can be re...
Breaking large projects into numbered, independently shippable phases — each with explicit entry criteria, exit criteria...
When working across multiple AI-assisted sessions, continuity must be encoded in files, not in conversation history. A s...
An AI coding assistant that launches background processes (dev servers, database connections, build watchers) will fight...
Gate every commit on a passing test suite, not on "the feature looks done." With 1,500+ tests across a project, the suit...
The visual feature extraction pipeline ran sklearn's KMeans on every thumbnail to find 5 dominant colors. Each call took...
Multiple pipeline stages in Artemis started with per-row INSERT or UPDATE patterns that worked fine during development (...
The Artemis project needed voter preference data to build its statistical models and calendar optimizer. But real vote d...
We needed to cluster 12,217 Artemis II mission photos into visually distinct groups. The clusters serve a specific downs...
The original thumbnail downloader processed 12,217 images sequentially. Each download created a new httpx.Client instanc...
When investigating why multimodal clustering produced zero results, the breakthrough came from a simple query: sql SELEC...
Multimodal clustering required images to have both CLIP image embeddings AND text embeddings. The intersection of these ...
While developing the concurrent thumbnail downloader, the DuckDB warehouse file (warehouse.duckdb) became locked by the ...
Python scripts in the Artemis project span multiple roles: XML/JSON migration, schema validation, metadata enrichment, t...
The original thumbnail downloader worked flawlessly on 5 images during development. When scaled to 12,217 images, it was...
The thumbnail download process was killed multiple times during development — once to change the rate limit, once to adj...
Future lessons identified from the Claude Code usage insights analysis (64 sessions, 150 commits, May 2026). These are p...
Large question banks authored by multiple sources (human or AI) accumulate factual errors that are invisible to structur...
When humans author multiple-choice questions, the correct answer tends to cluster in certain positions (often B or C). T...
A structured review skill turns the ad-hoc "look at this code and tell me what's wrong" request into a repeatable, evide...
A Claude Code skill file is a structured prompt that turns a repeatable workflow into a single slash command. The skill'...
A phased plan is only as good as its execution discipline. A /phase skill automates the mechanical parts of plan executi...
When hundreds of data records need the same type of update (adding titles, categories, tags, or enriched descriptions), ...
localStorage can serve as a full persistence layer for client-side applications when the data is user-specific, the data...
A whole-codebase code review is only as valuable as the remediation that follows it. The review itself produces a findin...
When you have hundreds or thousands of content items authored by different sources at different times, quality varies wi...
A Content Security Policy (CSP) is achievable on a static site without server-side headers by using a <meta> tag. The ch...
Migrating an existing multi-page site to a design system is a page-by-page operation, not a big-bang rewrite. The design...
Writing a design document and a Physical Design Requirements (PDR) document before coding catches architectural mistakes...
When migrating a data format (XML to JSON) that feeds a rendering pipeline, the only way to prove the migration is corre...
A progressive hint system (brief nudge → full explanation → deep-dive knowledge) is more pedagogically effective than a ...
A browser-based application that uses DOM APIs (querySelector, innerHTML, addEventListener) can be integration-tested in...
After a migration, the old system's artifacts (files, code, tests, scripts) must be actively removed in a deliberate cle...
Systematically extracting lessons from project work — and writing them as standalone documents — turns ephemeral experie...
Breaking large features into ordered phases — each independently shippable, each ending with a commit — transforms ambit...
When a system needs to support multiple "providers" (vendors, brands, data sources) that share the same behavior but dif...
Adding 50+ exams across 10 providers to a quiz application required zero changes to the core quiz engine, data loader, o...
Adding runtime schema validation to your data loading layer catches entire categories of bugs that would otherwise surfa...
When multiple people or processes author data files for the same system without a shared schema, variant schemas emerge....
A full-featured application (quiz engine, progress persistence, scoring, results dashboards, 10 providers, 50+ exams) ca...
When critical logic is embedded in a class that's hard to test (DOM-coupled UI class), developers sometimes copy the log...
When hints contain the exact text of the correct answer choice, they short-circuit learning. The learner reads the hint,...
XML entity encoding bugs (Q&A vs Q&A) are the most common class of data corruption in XML content pipelines. They're...
When migrating a live data format (XML to JSON), the key risk is not the conversion itself — it's proving that the new f...
Using innerHTML to render content from "your own" data files (XML, JSON, markdown) is an XSS vulnerability even when the...
Before choosing a similarity algorithm, understand whether your data uses binary membership (item has feature or doesn't...
Occupation codes are not stable identifiers across taxonomy revisions. The same SOC code can refer to different occupati...
Government data sources contain artifacts of their internal production processes — temp files in archives, renamed colum...
Base observations are source truth; derived values are computed artifacts. Mixing them in the same table creates ambigui...
When two projects share an author, the stronger design system should inform the weaker one — but adopting visual feel is...
A four-layer warehouse architecture (raw, staging, core, marts) with strict separation of concerns at each layer produce...
Federal data sources are not designed for programmatic access. They block bare HTTP requests, publish in heterogeneous f...
A static site can replicate a dynamic API by intercepting JavaScript fetch() calls and redirecting them to pre-built JSO...
Geographic wage comparisons are inherently incomplete: nominal gaps do not account for cost-of-living differences, suppr...
Data pipelines fail — downloads timeout, parsers hit unexpected formats, database connections drop. Idempotency (running...
Comparing nominal wages across years is misleading because the dollar's purchasing power changes over time. Converting t...
Once a warehouse holds multiple vintages of the same dataset, every query must explicitly decide whether it wants the la...
Percentage changes on small bases are statistically volatile and can dominate ranked lists even when the absolute econom...
Government data sources change column names, add or remove columns, and retype columns between releases — often without ...
A server-side web application can be deployed to a static hosting platform by pre-rendering every page and API response ...
Separating tests by their infrastructure requirements — fixtures-only, in-memory server, real database — lets CI run fas...
When building an analytical warehouse from multiple federal data products, the single most important architectural decis...
When loading multiple vintages of the same dataset, dimension tables must deduplicate on business key alone — not on bus...
When a web framework dispatches synchronous endpoint handlers to a thread pool, a shared database connection will produc...
Fact tables store snapshots — single measurements at single points in time. Time-series analysis requires a separate nor...
A web application that shows buttons, links, or filters for data that does not exist creates a worse experience than one...