mando-cli status read-only + bounded (2026-05-06)

mando status was hanging for 20 seconds because it was running DDL through the Flyway migration runner and retrying on a 10×2s loop. This note pins the rule: status commands must be pure reads, no DDL, no retries, no mutations — and documents the connect_readonly / table_exists helpers that enforce it in mando-cli’s db/flyway.rs.

For Agents

Commit: 68bcc63 — fix: make mando status read-only and bounded under 2s Result: mando status runs in ~0.3s with no db setup retries logged. File touched: src/db/flyway.rs

Symptoms

mando status hung for ~20 seconds on cold start, even though all it should do is print “what’s currently set up”. db setup retry log lines appeared during the wait — a clear signal that status was somehow taking the migration codepath.

Root cause

The status command called FlywayMigrationRunner::status which internally called Self::connect, which called ensure_database. ensure_database did all of the following:

  • 10×2s retry loop on tokio_postgres::connect (worst case 20s wait)
  • DDL: CREATE DATABASE IF NOT EXISTS …
  • DDL: CREATE SCHEMA IF NOT EXISTS mando
  • DDL: CREATE TABLE IF NOT EXISTS mando.migration_history (…)

All of this is appropriate for migrate apply. None of it is appropriate for status. Status’s job is to report state; it has no business writing schema or waiting 20 seconds for postgres to come up.

Fix

Single-file change in src/db/flyway.rs:

connect_readonly(config) — new helper

A single tokio_postgres::connect wrapped in a 2s tokio::time::timeout. No retry. No ensure_database. Returns Result<Client, …> — if postgres isn’t reachable in 2 seconds, status fails fast and surfaces ”?” / unknown to the user.

After the connect succeeds, the very first thing it does is:

client.batch_execute("SET statement_timeout = '2s'").await?;

Why this is critical

The connect timeout alone does not bound query times. A successful TCP connect to postgres can be followed by a query that hangs for minutes (network blip, slow lock, blocked pid). SET statement_timeout bounds every subsequent query on this session at the postgres server level. Without it, a single status call could hang indefinitely after a fast connect.

table_exists(client, schema, name) — new helper

SELECT 1 FROM information_schema.tables WHERE table_schema = $1 AND table_name = $2 LIMIT 1 and check whether a row came back. Used in three places: status (twice — once per migration table), and check_flyway.

Status path — read-only check before query

status:
    connect_readonly()
    table_exists(client, "mando", "migration_history") ?
        yes -> get_applied_versions(client) -> HashSet<i64>
        no  -> HashSet::new()                 // treat absence as zero-applied
    -> render

The table_exists short-circuit BEFORE get_applied_versions is what lets us stay read-only on a fresh database. get_applied_versions would otherwise error against a missing table, and we’d be tempted to “fix” that by creating it — which is exactly the trap ensure_database was set in.

apply() and list() keep the old path

These commands are explicitly allowed to mutate, so they continue to use connect + ensure_database (with the 10×2s retry). Don’t conflate them with status.

The rule

Status commands must be pure reads

  • No DDL — no CREATE, ALTER, DROP, no schema-bootstrap-on-the-fly.
  • No mutationsSELECT only.
  • No retries — if postgres is “warming up”, the answer is ”?” / unknown. Fail fast and surface that. Don’t paper over it with a 20-second wait that the user has to sit through.
  • Bounded — set statement_timeout at the start of the session so EVERY query you run is bounded, even queries you forget about.

The same rule applies to any future read-only command (e.g. mando config show, mando profile list). If a command’s name doesn’t promise mutation, it must not mutate.

Result

$ time mando status
… (status output) …
mando status  0.30s user 0.04s system 92% cpu 0.367 total

No db setup retry log lines. No DDL run. Bounded under 2s by construction even on a misbehaving postgres.