Phased Multi-Cloud Infrastructure as Code

Three cloud stacks (AWS, Azure, GCP) built in separate phases with OIDC federation, avoiding cross-cloud coupling while sharing a common authentication pattern.

Tags

Phased Multi-Cloud Infrastructure as Code

The Lesson

When building infrastructure for multiple cloud providers, treat each provider as an independent phase with its own IaC tool, deployment workflow, and tests. The only shared pattern should be the authentication mechanism (OIDC federation) and the deployment contract (what the application expects). Attempting to unify IaC across providers with abstraction layers adds complexity without reducing work.

Context

A static site with a RAG chatbot backend needed to deploy to AWS, Azure, and GCP alongside an existing GitHub Pages pipeline. Each cloud deployment runs the same Docker container for the backend and serves the same static assets for the frontend, but uses provider-native services for LLM inference and vector search. CI/CD runs on GitHub Actions. The goal was keyless deployments — no long-lived credentials stored in GitHub Secrets.

What Happened

  1. Built each cloud stack in a separate implementation phase (Phase 6: AWS, Phase 7: Azure, Phase 8: GCP), each with its own commit. This prevented cross-cloud coupling — a bug in the Azure Bicep template couldn't break the AWS CloudFormation stack.
  2. AWS (CloudFormation): Full VPC with 4 subnets across 2 AZs, ALB, ECS Fargate, ECR with lifecycle policy, CloudFront with S3 origin and ALB backend origin (cache bypass for /api/*), IAM roles with Bedrock and OpenSearch policies, CloudWatch Logs. GitHub OIDC provider with subject constraint repo:bonjohen/lessons:*.
  3. Azure (Bicep): Container Apps with Log Analytics, ACR, Static Web Apps (Free tier), Azure AI Search (Free tier), Azure OpenAI with two model deployments, Key Vault. Container Apps scale to zero in staging (minReplicas: 0), minimum 1 in production.
  4. GCP (shell script): Enables APIs, creates Artifact Registry, Cloud Storage bucket, Workload Identity Pool/Provider, and a service account with granular IAM roles. Chose a shell script over Terraform because the resource set is small and the Workload Identity setup is imperative by nature.
  5. All three stacks use environment-based parameters (staging vs. production) that control replica counts, log retention, and scaling behavior. The same template handles both environments.
  6. All three CI/CD workflows (one per cloud) authenticate via OIDC/Workload Identity Federation — GitHub Actions presents a JWT, the cloud provider validates it against the configured trust relationship, and temporary credentials are issued. No static access keys or service account JSON files.
  7. Each cloud's adapter tests run independently using sys.modules.setdefault() mocking — no cloud SDK is required in CI.

Key Insights

Applicability

This approach works when:

It does not apply when:

Related Lessons

Related Lessons