Lesson 034: Reusing Query Modules Across CLI and Web

Lesson 034: Reusing Query Modules Across CLI and Web

The Lesson

When a CLI pipeline and a web API need the same data, import the query functions directly rather than duplicating SQL. Add the serialization layer (Pydantic models, JSON responses) at the API boundary, not in the query module. The query module returns plain Python objects (dataclasses, dicts, tuples); the web route converts them. This keeps the query logic in one place and testable without a web framework.

Context

A calendar image selection platform had a CLI-driven review package (review/queries.py) with functions like fetch_all_candidates(), fetch_candidate_images(), and fetch_cluster_alternatives(). These returned dataclasses (CandidateScore, MonthImage, ClusterAlternative). A new web API needed the exact same data for its /api/candidates and /api/candidates/{name} endpoints. The question was whether to duplicate the queries in the web layer (risking drift) or import them directly (risking tight coupling).

What Happened

  1. Started by reading the existing review/queries.py. It contained 5 functions and 3 dataclasses, all taking a duckdb.DuckDBPyConnection as the first argument. No framework-specific code — pure SQL + dataclass construction.
  2. Created web/models.py with Pydantic schemas (CandidateResponse, MonthImageResponse, etc.) that mirrored the review dataclasses but with Pydantic's serialization, validation, and OpenAPI integration.
  3. The web route routes/candidates.py imported the query functions directly: from artemis_calendar.review.queries import fetch_all_candidates, fetch_candidate_images. The route calls the function, iterates the returned dataclasses, and converts each to a Pydantic model.
  4. New queries that only the web app needed (paginated image browse, image detail with rank) went into a separate web/queries.py that returns plain dicts — no dataclasses, since the web is their only consumer.
  5. The review module's resolve_run_id() function was also reused to default both CLI and web to the latest optimization run.

Key Insights

Examples

# review/queries.py — framework-agnostic, returns dataclasses
@dataclass
class CandidateScore:
    candidate_name: str
    objective_score: float
    # ...

def fetch_all_candidates(conn, run_id) -> list[CandidateScore]:
    rows = conn.execute("SELECT ... WHERE candidate_run_id = ?", [run_id]).fetchall()
    return [CandidateScore(...) for r in rows]

# web/models.py — Pydantic, API-facing
class CandidateResponse(BaseModel):
    candidate_name: str
    objective_score: float = 0.0

# web/routes/candidates.py — conversion at the boundary
from artemis_calendar.review.queries import fetch_all_candidates

@router.get("/api/candidates")
def list_candidates(conn=Depends(get_db)):
    candidates = fetch_all_candidates(conn, run_id)
    return [CandidateResponse(candidate_name=c.candidate_name, ...) for c in candidates]

Applicability

This pattern applies when: multiple consumers (CLI, web, notebooks) need the same data from the same database, and the query logic is non-trivial (joins, subqueries, aggregation). It does NOT apply when: the web API needs fundamentally different data shapes (GraphQL), when queries need async/await (DuckDB is synchronous), or when the query module has side effects (writes) that the web layer shouldn't trigger.

Related Lessons