Spec 2a entry point
This is the approved design for the first slice of Spec 2 — core app deploy, EU region migration, and the first app (
wowjeeez/polymarket-fetch). Spec 2b (app telemetry), 2c (Cloudflare ZTNA), 2d (Cloud Run Jobs), and a hardening pass remain follow-ups under the Spec 2 roadmap umbrella.
Spec 2a layers application deployment on top of the live [[spec-1-deployment-complete|ops-vm]] and migrates the deployment from us-central1 to europe-west3 (Frankfurt) so the whole stack lives in EU. The mechanism is a central manifest in this Terraform repo, Ansible-driven; CI authenticates to GCP via Workload Identity Federation (no long-lived keys). Two runtime shapes — service (daemon) and job (systemd-timer-driven one-shot) — cover the dual nature of the first app and any future shape.
Source spec: docs/superpowers/specs/2026-05-25-app-deploy-design.md in the repo.
For Agents
- Region:
europe-west3(zoneeurope-west3-a). The existingus-central1VM will be destroyed and reprovisioned in EU as part of this work.- First app:
wowjeeez/polymarket-fetch— a long-running telegram agent (kindservice) plus a 15-minute snapshot job (kindjob).- Deploy trigger: manual
make deploy APP=<name>(auto-deploy deferred).- CI auth: Workload Identity Federation (OIDC), scoped per repo via IAM bindings on
ci-pusher@…iam.gserviceaccount.com.- App secrets:
secrets/<app>.envfiles, gitignored, mode0600, copied by Ansible to/opt/apps/<app>/.env.- Status: approved, ready for implementation planning. Cross-link: spec-2-roadmap.
Context
Spec 1 produced a Docker-ready, Artifact-Registry-ready, Tailscale-connected VM with host metrics shipping to SigNoz Cloud — live as ops-vm in aerobic-tesla-490112-r3 (currently us-central1-a). Spec 2a:
- Adds Artifact Registry + Workload Identity Federation to the Terraform layer.
- Adds an
appsAnsible role + centraldeploys/apps.ymlmanifest. - Migrates the VM to
europe-west3-a. - Ships the first app (
polymarket-fetch) end-to-end (CI → AR → VMservice+job).
Goals
make deploy APP=<name>deploys an app from Artifact Registry onto the VM.- The deploy mechanism supports both services (daemon,
docker compose up -d) and jobs (one-shot,docker compose run --rmdriven by a systemd timer). - CI in app repos pushes images to AR via Workload Identity Federation — no long-lived service-account keys anywhere.
- Per-app env/secrets stay in the existing
secrets/pattern from Spec 1. - All GCP resources live in
europe-west3; the existingus-central1VM is destroyed and reprovisioned in EU.
Non-Goals (deferred)
- Spec 2b — app logs and traces via OTel
filelog+ OTLP receivers. - Spec 2c — Cloudflare ZTNA +
cloudflaredrole for public app URLs. - Spec 2d — Cloud Run Jobs for heavy/spiky batch.
- Hardening — GCP Secret Manager for centralized rotation; no-public-IP + Cloud NAT.
- Auto-deploy — CI triggering a VM-side deploy automatically; manual
make deployfor now.
Decisions (locked in brainstorm)
| Decision | Choice | Rationale |
|---|---|---|
| GCP region | europe-west3 (Frankfurt) + zone europe-west3-a | EU data locality; low latency from Central Europe; large/mature region. |
| Existing VM | Destroy + reprovision in EU | Single-region story; VM is 2 days old with no app data — base services only. |
| Deploy mechanism | Central manifest in this terraform repo, Ansible-driven | Solo operator; central audit + rollback; no per-repo deploy keys. |
| Runtime shapes | service and job | Covers polymarket-fetch dual nature and any future app shape. |
| CI → AR auth | Workload Identity Federation | No long-lived keys; OIDC; scoped per repo via IAM condition. |
| App env/secrets | Local secrets/<app>.env per app, Ansible-copied | Consistent with Spec 1 pattern; gitignored; mode 0600. |
| AR repo structure | One Docker repo apps in europe-west3, multiple images underneath | Simple for a personal infra repo. |
| Image tagging | Git SHA + :latest floating | Manifest pins :latest; ExecStartPre=docker pull ensures latest on (re)start; SHA available for traceability. |
| Deploy trigger | Manual: make deploy APP=<name> | Auto-deploy is a Spec 2 hardening item. |
| Reusable workflow location | wowjeeez/terraform/.github/workflows/build-push.yml | Single source of truth; app repos call via uses:. Requires pushing this terraform repo to GitHub as wowjeeez/terraform. |
Architecture
Spec 2a build-push-deploy flow
graph TD Push["git push main (app repo)"] --> Auth["WIF auth, id-token write"] Auth --> Build["docker buildx build, tag sha + latest"] Build --> AR["Artifact Registry: europe-west3-docker.pkg.dev/proj/apps"] Op["operator: make deploy APP=name"] --> Ansible["ansible --tags apps --extra-vars deploy_only=name"] Ansible --> Compose["render /opt/apps/name/ compose + .env"] Ansible --> Unit["render systemd unit (service or job+timer)"] Unit --> Pull["ExecStartPre=docker pull"] Pull --> AR Pull --> Run["docker compose up -d, or docker compose run --rm"] style AR fill:#264653,stroke:#2a9d8f,color:#fff style Run fill:#2d2d2d,stroke:#888,color:#fff
Plain-text version:
GitHub Actions (app repo push to main)
permissions: id-token: write
google-github-actions/auth@v2 (WIF) then short-lived GCP token
docker buildx build then tag :SHA + :latest then push to
europe-west3-docker.pkg.dev/PROJ/apps/IMAGE
operator: make deploy APP=NAME
ansible-playbook --tags apps --extra-vars deploy_only=NAME
copy secrets/NAME.env to /opt/apps/NAME/.env (0600 ops:ops)
template /opt/apps/NAME/docker-compose.yml
template /etc/systemd/system/apps-NAME.service|timer
systemctl daemon-reload + enable + restart
ExecStartPre=docker pull IMAGE
ExecStart=docker compose up -d (kind: service)
docker compose run --rm SVC (kind: job, triggered by timer)Repo Layout
. repo root = Terraform root
Makefile extended: + deploy, + redeploy-all
README.md extended with app-deploy operator UX
CLAUDE.md extended: pointers to new guides
main.tf extended: + module ar_wif + outputs
variables.tf extended: ar_repo_name, ci_github_repos
outputs.tf extended: wif_provider, ci_sa_email, ar_repo_url
modules/
vm/ unchanged
ar_wif/ NEW: AR repo + WIF pool + provider + CI SA + IAM
main.tf
variables.tf
outputs.tf
ansible/
site.yml extended: load deploys/apps.yml, add apps role
roles/
base/ unchanged
docker/ extended: + AR credential helper task
github_keys/ unchanged
monitoring/ unchanged
apps/ NEW: renders compose + systemd per app
tasks/main.yml
defaults/main.yml
handlers/main.yml
templates/
docker-compose.yml.j2
service.unit.j2
job.unit.j2
job.timer.j2
deploys/ NEW
apps.yml the central manifest (apps: list)
.github/workflows/
build-push.yml NEW: reusable workflow (workflow_call)
secrets/
signoz_ingestion_key existing
github_deploy_key existing, optional
polymarket-telegram-agent.env NEW pattern, gitignored
polymarket-snapshot.env NEW pattern, gitignoredterraform.tfvars is also extended (gitignored). New keys: ci_github_repos (list of owner/repo strings); region/zone updated to EU.
Terraform Layer
Variables (new/changed)
| Variable | Default | Notes |
|---|---|---|
region | europe-west3 | CHANGED from us-central1. |
zone | europe-west3-a | CHANGED from us-central1-a. |
ar_repo_name | apps | Docker-format AR repo name. |
ci_github_repos | [] | List of owner/repo strings; each gets a WIF binding to impersonate the CI SA. |
modules/ar_wif/
Reusable module creating:
google_artifact_registry_repository.this— Docker format, region from input.google_iam_workload_identity_pool.github— poolgithub-pool.google_iam_workload_identity_pool_provider.github_oidc— OIDC provider with:issuer_uri = "https://token.actions.githubusercontent.com"attribute_mapping:google.subject = assertion.subattribute.repository = assertion.repositoryattribute.actor = assertion.actor
attribute_condition = "assertion.repository_owner == 'wowjeeez'"— restricts the pool to repos owned by the user, defense in depth on top of per-repo IAM bindings.
google_service_account.ci_pusher—ci-pusher@PROJ.iam.gserviceaccount.com.google_artifact_registry_repository_iam_member—ci_pushergrantedroles/artifactregistry.writeron theappsrepo.google_service_account_iam_member(for_eachoverci_github_repos) — grantsroles/iam.workloadIdentityUseronci_pushertoprincipalSet://iam.googleapis.com/projects/NUM/locations/global/workloadIdentityPools/github-pool/attribute.repository/OWNER/REPO.- Outputs:
wif_provider_resource_name,ci_service_account_email,ar_host(e.g.europe-west3-docker.pkg.dev),ar_repo_url(full patheurope-west3-docker.pkg.dev/PROJ/apps).
main.tf wiring
module "ar_wif"instantiated once withvar.ar_repo_name,var.ci_github_repos,var.region.depends_onAPI enablement.- Outputs re-exported at the root.
API enablement
The existing google_project_service for_each already enables iam.googleapis.com and artifactregistry.googleapis.com. Spec 2a extends the list with iamcredentials.googleapis.com (required for WIF token exchange) and sts.googleapis.com (Security Token Service).
Ansible Layer
docker role (extended)
Append to ansible/roles/docker/tasks/main.yml:
- Add the Google Cloud apt repo + key (keyring/
signed-bypattern). - Install
google-cloud-cli. - Run
gcloud auth configure-docker europe-west3-docker.pkg.dev --quietas theopsuser. Writes the credential helper into~ops/.docker/config.jsonso subsequentdocker pull AR_HOST/...authenticates via the VM’s attached service account (roles/artifactregistry.readerfrom Spec 1).
apps role (new)
ansible/roles/apps/ reads the apps: list (loaded via site.yml’s vars_files: [deploys/apps.yml]).
tasks/main.yml per app entry:
file:create/opt/apps/NAME/(mode 0755, ownerops).copy:secrets/NAME.envto/opt/apps/NAME/.env(mode 0600, ownerops).template:renderdocker-compose.ymlfromtemplates/docker-compose.yml.j2. Compose body contains the image,env_file: .env, restart policy (services), and anycompose_extraoverrides.template:render the systemd units:kind: serviceproduces/etc/systemd/system/apps-NAME.service:Type=oneshot,RemainAfterExit=yesWorkingDirectory=/opt/apps/NAMEUser=opsExecStartPre=/usr/bin/docker pull IMAGEExecStart=/usr/bin/docker compose up -dExecStop=/usr/bin/docker compose down[Install] WantedBy=multi-user.target
kind: jobproduces/etc/systemd/system/apps-NAME.service:Type=oneshotWorkingDirectory=/opt/apps/NAMEUser=opsExecStartPre=/usr/bin/docker pull IMAGEExecStart=/usr/bin/docker compose run --rm SERVICE_NAME
- and
apps-NAME.timer:[Timer] OnCalendar=SCHEDULE,Persistent=true[Install] WantedBy=timers.target
systemd:daemon_reload: yes,enabled: yes,state: started/restartedfor the unit (and the timer forkind: job).- Honor
deploy_only=NAMEextra-var (skip entries not matching). - Pre-validation:
command: docker compose -f /opt/apps/NAME/docker-compose.yml configafter templating, fail the play on a bad compose file before activating systemd.
Handler: daemon-reload triggered by changes to systemd unit files.
defaults/main.yml: empty apps: [] default (the real list is in deploys/apps.yml); defaults for template knobs (e.g. compose default restart: unless-stopped for services).
site.yml change
vars_files: ["{{ playbook_dir }}/../deploys/apps.yml"]at play level (orinclude_varsinside the apps role).- Add
appsas the final role, aftermonitoring.
Makefile additions
deploy:
$(ANSIBLE) ansible/site.yml --tags apps --extra-vars "deploy_only=$(APP)"
redeploy-all:
$(ANSIBLE) ansible/site.yml --tags appsUsage: make deploy APP=polymarket-telegram-agent.
CI - Reusable Workflow
.github/workflows/build-push.yml (in this terraform repo, called via workflow_call):
on:
workflow_call:
inputs:
image_name: {required: true, type: string}
dockerfile: {required: false, type: string, default: Dockerfile}
context: {required: false, type: string, default: "."}
wif_provider: {required: true, type: string}
ci_service_account: {required: true, type: string}
ar_host: {required: true, type: string}
ar_project: {required: true, type: string}
ar_repo: {required: true, type: string}
jobs:
build-push:
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ inputs.wif_provider }}
service_account: ${{ inputs.ci_service_account }}
- uses: google-github-actions/setup-gcloud@v2
- run: gcloud auth configure-docker ${{ inputs.ar_host }} --quiet
- uses: docker/setup-buildx-action@v3
- name: Build and push
run: |
IMAGE=${{ inputs.ar_host }}/${{ inputs.ar_project }}/${{ inputs.ar_repo }}/${{ inputs.image_name }}
docker buildx build --push -t $IMAGE:${{ github.sha }} -t $IMAGE:latest -f ${{ inputs.dockerfile }} ${{ inputs.context }}App repos add .github/workflows/release.yml:
on:
push:
branches: [main]
jobs:
telegram-agent:
uses: wowjeeez/terraform/.github/workflows/build-push.yml@main
permissions: {contents: read, id-token: write}
with:
image_name: polymarket-telegram-agent
dockerfile: docker/telegram-agent.Dockerfile
wif_provider: projects/NUM/locations/global/workloadIdentityPools/github-pool/providers/github
ci_service_account: ci-pusher@aerobic-tesla-490112-r3.iam.gserviceaccount.com
ar_host: europe-west3-docker.pkg.dev
ar_project: aerobic-tesla-490112-r3
ar_repo: apps
snapshot:
uses: wowjeeez/terraform/.github/workflows/build-push.yml@main
permissions: {contents: read, id-token: write}
with:
image_name: polymarket-snapshot
dockerfile: docker/snapshot.Dockerfile
wif_provider: projects/NUM/locations/global/workloadIdentityPools/github-pool/providers/github
ci_service_account: ci-pusher@aerobic-tesla-490112-r3.iam.gserviceaccount.com
ar_host: europe-west3-docker.pkg.dev
ar_project: aerobic-tesla-490112-r3
ar_repo: appsThe WIF provider, CI service account, AR host/project/repo are taken from terraform output after make vm-up — they are identifiers, not secrets, safe to commit into the app repos.
Migration of the Existing VM (US to EU)
Because the VM moves from us-central1 to europe-west3:
- (If not already done) Revoke the existing Tailscale auth key in the admin console; generate a fresh reusable + ephemeral + pre-authorized +
tag:cloud-tagged auth key. Paste intoterraform.tfvars. make deprovision— theus-central1VM is destroyed; the ephemeral Tailscale node auto-removes from the tailnet shortly after.- Edit
terraform.tfvars: setregion = "europe-west3",zone = "europe-west3-a". Keepvm_name = "ops-vm"(so MagicDNS resolution stays unchanged:ops-vmstill resolves on the tailnet). make plan && make vm-up && make configure.make verifyconfirms VM up + all services active. Host metrics resume to SigNoz from the new region.
The new VM has a different external IP (irrelevant — SSH is via Tailscale). MagicDNS resolution of ops-vm automatically points at the new node once it joins.
First App - Operator Steps (end-to-end)
-
In
wowjeeez/polymarket-fetch: author two Dockerfiles (e.g.docker/telegram-agent.Dockerfile,docker/snapshot.Dockerfile). Author.github/workflows/release.ymlcalling the reusable workflow. -
In this terraform repo: add
"wowjeeez/polymarket-fetch"toci_github_reposinterraform.tfvars.make plan && make vm-up(WIF IAM binding update; no VM downtime). -
Push
polymarket-fetch:mainthen CI builds + pushes both images to AR. -
Append two entries to
deploys/apps.yml:apps: - name: polymarket-telegram-agent kind: service image: europe-west3-docker.pkg.dev/aerobic-tesla-490112-r3/apps/polymarket-telegram-agent:latest env_file: secrets/polymarket-telegram-agent.env - name: polymarket-snapshot kind: job image: europe-west3-docker.pkg.dev/aerobic-tesla-490112-r3/apps/polymarket-snapshot:latest env_file: secrets/polymarket-snapshot.env schedule: "*/15 * * * *" -
Create
secrets/polymarket-telegram-agent.envandsecrets/polymarket-snapshot.envlocally. -
make deploy APP=polymarket-telegram-agentthenmake deploy APP=polymarket-snapshot. -
make ssh; verify withsystemctl status apps-polymarket-telegram-agent,systemctl list-timers apps-polymarket-snapshot.timer, anddocker ps.
Error Handling & Idempotency
- The
appsrole is idempotent: re-running with no manifest changes is a no-op. With changes, compose/unit files re-render and the handler restarts the affected unit. ExecStartPre=docker pullmeans every (re)start picks up:latest. There’s a brief window where an old container coexists with a new pull — acceptable for this scale.- A failed
docker pull(network blip, AR transient) prevents start; systemd reports the failure;journalctl -u apps-NAMEshows the error. - A bad compose file is caught by
docker compose configat deploy time before systemd activates. - WIF token-exchange failures in CI surface as a clear
google-github-actions/auth@v2step error with the GCP message — common causes: missingid-token: write, repo not added toci_github_repos, WIF provider URI mismatch. - All Ansible role changes are idempotent — re-running
make configureis safe.
Verification
terraform planafter Spec 2a shows: AR repo, WIF pool + provider, CI SA, IAM bindings (per repo), expanded API enablement.terraform validateandterraform fmt -checkclean.ansible-playbook --syntax-check ansible/site.ymlclean.- Post-migration
make verifyconfirms VM healthy + OTel metrics flowing fromeurope-west3. - Post-deploy:
systemctl is-active apps-polymarket-telegram-agentreturnsactive;docker psshows the running container; the agent posts something to Telegram (confirmed in chat). - Snapshot job:
systemctl list-timers apps-polymarket-snapshot.timershows the next-firing schedule; after the first scheduled run,journalctl -u apps-polymarket-snapshot.serviceshows the run output. - WIF end-to-end: pushing a commit to
polymarket-fetch:maintriggers the workflow; the auth step exchanges the OIDC token; build+push completes; new image tag visible ingcloud artifacts docker images list europe-west3-docker.pkg.dev/PROJ/apps.
What’s Deferred (Spec 2b / 2c / 2d / Hardening)
- Spec 2b — app telemetry: extend
monitoringrole withfilelogand OTLP receivers; pipelines forlogsandtracesshipping to SigNoz. - Spec 2c —
cloudflaredAnsible role + Cloudflare Access policies; per-app public URL gating. - Spec 2d — Cloud Run Jobs (Terraform) for heavy/spiky batch.
- Hardening — GCP Secret Manager (rotation + audit); no-public-IP + Cloud NAT (drops the external IPv4 charge, removes the last public-facing surface).
- Auto-deploy — webhook or pull-based reconciler that triggers
make deployon AR push.
Each is additive to this layout — no restructuring required.