BE-1842 Datadog Observability

Branch: feature/BE-1842

Context

Datadog log explorer showed many log lines with blank MDC columns (@SPAN.FLOW.CONTEXT, @SPAN.FLOW.EXEC_ID, @STEPCONNECTION, @STEP.NAME). Especially info/warn/error from scheduled jobs and adapter calls.

Root Cause Analysis

Three independent investigations converged on the same causes:

1. DatadogFormatter only serialized the leaf span (FIXED)

mando-lib/src/app/dd_formatter.rs:177-181format_event() resolved the current span via event.parent() / ctx.lookup_current() and serialized only that single span’s fields via SerializableSpan. It never walked ctx.event_scope() to collect fields from parent spans.

Fields like flow.exec_id and flow.context are set on parent workflow/flow spans (mando-lib workflow/mod.rs:303, workflow/flow.rs:250). When events fired from child spans (adapters, repos), those parent fields were lost.

Fix: Replaced SerializableSpan with collect_span_fields() that walks ctx.event_scope() root-to-leaf, merging all ancestor span fields. Child fields override parent fields.

2. Scheduler callbacks emit logs outside any span (TODO)

  • flow_scheduler.rs:346warn!("Data update flow is already running") before span creation
  • flow_scheduler.rs:378error!("Unexpected flow type...") before span creation
  • mando-bess/src/lib.rs:307 — cache cleanup error/info in bare async block

3. tokio::spawn breaks span inheritance (TODO)

  • workflow/flow.rs:194 — flow execution spawned into new task, parent span context not fully inherited
  • save_data_route.rs:94tokio::spawn() without .instrument()

Changes

dd_formatter.rs

  • Removed SerializableSpan struct + Serialize impl (~60 lines)
  • Added collect_span_fields<S, N>() (~23 lines) — walks full scope, merges fields
  • Updated format_event() to use ctx.event_scope() + collect_span_fields

Commits

  • 222efe76feat: datadog safe field formatter (DatadogSafeFieldFormatter for field name remapping)
  • 19d8c236 / 0c9976b7 — follow-up fix
  • 8695973afix: transient flow parsing in dd (scope traversal fix)

Remaining Work

  • Wrap scheduler/cron callbacks in spans (Fix 2)
  • Add .instrument() to tokio::spawn sites (Fix 3)
  • Consider adding full_id as stored field on StepMetadata for #[instrument] proc macro compatibility
  • mando-lib — core crate containing the formatter and workflow engine
  • mando-bess — REST API with scheduler and route handlers
  • Agent Context — dense agent reference