Live Infrastructure for Integration Testing

When local services are already running, skip mocks and test the real pipeline end-to-end

Tags

Live Infrastructure for Integration Testing

The Lesson

When local infrastructure happens to be running — an LLM server, a vector database, a message broker — use it for integration testing instead of defaulting to mocks. Mocks prove that your code calls the right functions; live tests prove that your system actually works.

Context

Lessons Hub V2 has a RAG backend: a FastAPI server that retrieves lesson chunks from ChromaDB (vector store), sends them to Ollama (local LLM), and returns grounded answers with source citations. The pipeline has five stages: corpus build, embedding, retrieval, generation, and gap detection. Each stage was developed with unit tests using mocks — mocked vector adapters, mocked LLM responses, mocked gap stores. After 134 unit tests passed, the question was whether the real pipeline worked end-to-end.

What Happened

  1. After completing five phases of code improvements (schema fixes, security hardening, adapter factories, structured logging, caching), all 134 backend unit tests passed and lint was clean.
  2. Before writing integration tests with mocks, a quick check revealed Ollama was already running locally with nomic-embed-text (embeddings) and llama3.1:8b (chat), and ChromaDB's persistent directory existed but was empty.
  3. Instead of building a mock integration test, the real corpus was built from 116 harvested lessons (793 chunks), then embedded through Ollama into ChromaDB — a process that took about 30 seconds across 16 batches.
  4. The backend was started on an alternate port and tested with curl against real HTTP endpoints: /health, /api/retrieve, /api/chat, /api/gaps, /api/v1/retrieve, /metrics.
  5. The retrieve test returned ranked chunks with real similarity scores. The chat test produced a multi-paragraph LLM response citing specific lesson titles. The gap detection test correctly identified "Kubernetes pod autoscaling with KEDA" as a missing_platform gap and generated four GitHub search queries.
  6. The gap was persisted to both the runtime JSON store and the new review markdown artifact — verifying a feature that had only been tested with unit tests minutes earlier.
  7. The entire smoke test took less time than writing equivalent mock-based integration tests would have, and caught zero bugs — which was itself valuable confirmation that the unit tests were testing the right things.

Key Insights

Applicability

This pattern works when:

This pattern does NOT replace:

Related Lessons

Related Lessons