Deployed (2026-05-25)
Spec 1 of levandor-infra is complete and live on GCP. The
ops-vmis provisioned, configured, joined to the tailnet, and shipping host metrics to SigNoz Cloud with zero export failures. Spec 2 (application-deploy layer) is now the active workstream.
Production state of the GCP VM provisioning effort, plus the one late refinement to the auth model and the telemetry / cost numbers observed in the live environment.
What is Running
- VM —
ops-vmon GCE, machine typee2-small, zoneus-central1-a, Ubuntu 24.04 LTS. - Disk — 20 GB
pd-balancedboot disk. - Networking — dedicated custom-mode VPC (no
default-allow-ssh), ephemeral external IPv4 for egress only, no public ingress rules. Inboundudp:41641open for faster direct Tailscale connectivity. - Tailnet — joined under
tag:cloudvia ephemeral auth key; reachable on MagicDNS asops-vm. - Runtime — Docker CE + compose plugin,
fail2banactive,unattended-upgradesconfigured (no auto-reboot). - Telemetry agent —
otelcol-contribv0.152.1 running as a systemd service. - Operator access —
make sshover Tailscale; no public SSH port anywhere.
Late Refinement — Tailscale SSH Replaces Key-Based SSH
The approved Spec 1 design had key-based SSH over Tailscale. During the deployment pass that was switched to Tailscale SSH (tailscale up --ssh + an SSH ACL rule).
Why: drops the macOS Keychain prereq entirely. The VM-side tailscaled mints short-lived SSH credentials from the tailnet identity, so the operator workstation no longer needs an SSH keypair, an ssh-agent, or the one-time ssh-add --apple-use-keychain ceremony. One less moving part in the passwordless-operation chain.
ACL rule (lives in the Tailscale admin policy file):
"ssh": [
{
"action": "accept",
"src": ["autogroup:member"],
"dst": ["tag:cloud"],
"users": ["ops", "root"],
},
],ACL dst must match the device tag
The VM is joined with an auth key carrying
tag:cloud, so the SSH ACLdstmust betag:cloud, notautogroup:self.autogroup:selfdoes not match tagged devices — see the gotcha.
The monitoring role and all other Ansible behavior are unchanged — only the transport for make ssh and Ansible’s SSH connection moved from key-based to Tailscale SSH. Passwordless sudo via the google-sudoers group still covers become.
Telemetry Status — Flowing to SigNoz Cloud
- Endpoint —
ingest.eu2.signoz.cloud:443(OTLP/gRPC, TLS on,signoz-ingestion-keyheader). - Pipeline —
host_metricsreceiver →resourcedetection(system + GCP) →batch→otlp_grpcexporter. - Throughput observed — ~25,000 host-metric data points exported so far.
- Failures — zero.
otelcol_exporter_send_failed_metric_pointsis absent from:8888/metrics(i.e. the failure counter has never incremented).
How to verify live:
make ssh
curl -s localhost:8888/metrics | grep -E 'otelcol_exporter_(sent|send_failed)_metric_points'otelcol_exporter_sent_metric_points increases over time (good); the send_failed counter should be absent or zero. See the gotcha for why this is the canonical signal.
Cost — ~$18 / month
Rough monthly cost at the current config:
| Item | Cost |
|---|---|
e2-small (2 vCPU burstable, 2 GB RAM, 24/7) | ~$13 / mo |
20 GB pd-balanced boot disk | ~$2 / mo |
| Ephemeral external IPv4 (for egress only) | ~$3 / mo |
| Total | ~$18 / mo |
Network egress and SigNoz Cloud usage are inside their respective free tiers at this volume.
Spec 2 Preview (next workstream)
The application-deploy layer is now the active focus. Planned scope:
- Artifact Registry + per-repo CI workflow template for image pushes; VM pulls via its dedicated SA (already wired via
roles/artifactregistry.reader). docker composebased app deploys with a small Ansible role to land the compose file and rundocker compose up -d.- App logs and traces to SigNoz via a
filelogreceiver and OTLP trace intake on the existingotelcol-contrib. cloudflaredrole + Cloudflare ZTNA tunnels for public ingress (no GCP load balancer needed).- Batch jobs (systemd timers or k8s-cronjob-equivalent on the single VM).
- Hardening pass — drop the external IP in favor of Cloud NAT, migrate Tailscale + SigNoz keys to GCP Secret Manager.
See the Spec 2 stub at the end of Spec 2 Preview (out of scope here).