BE-1842 Datadog Observability
Branch: feature/BE-1842
Context
Datadog log explorer showed many log lines with blank MDC columns (@SPAN.FLOW.CONTEXT, @SPAN.FLOW.EXEC_ID, @STEPCONNECTION, @STEP.NAME). Especially info/warn/error from scheduled jobs and adapter calls.
Root Cause Analysis
Three independent investigations converged on the same causes:
1. DatadogFormatter only serialized the leaf span (FIXED)
mando-lib/src/app/dd_formatter.rs:177-181 — format_event() resolved the current span via event.parent() / ctx.lookup_current() and serialized only that single span’s fields via SerializableSpan. It never walked ctx.event_scope() to collect fields from parent spans.
Fields like flow.exec_id and flow.context are set on parent workflow/flow spans (mando-lib workflow/mod.rs:303, workflow/flow.rs:250). When events fired from child spans (adapters, repos), those parent fields were lost.
Fix: Replaced SerializableSpan with collect_span_fields() that walks ctx.event_scope() root-to-leaf, merging all ancestor span fields. Child fields override parent fields.
2. Scheduler callbacks emit logs outside any span (TODO)
flow_scheduler.rs:346—warn!("Data update flow is already running")before span creationflow_scheduler.rs:378—error!("Unexpected flow type...")before span creationmando-bess/src/lib.rs:307— cache cleanup error/info in bare async block
3. tokio::spawn breaks span inheritance (TODO)
workflow/flow.rs:194— flow execution spawned into new task, parent span context not fully inheritedsave_data_route.rs:94—tokio::spawn()without.instrument()
Changes
dd_formatter.rs
- Removed
SerializableSpanstruct +Serializeimpl (~60 lines) - Added
collect_span_fields<S, N>()(~23 lines) — walks full scope, merges fields - Updated
format_event()to usectx.event_scope()+collect_span_fields
Commits
222efe76—feat: datadog safe field formatter(DatadogSafeFieldFormatter for field name remapping)19d8c236/0c9976b7— follow-up fix8695973a—fix: transient flow parsing in dd(scope traversal fix)
Remaining Work
- Wrap scheduler/cron callbacks in spans (Fix 2)
- Add
.instrument()totokio::spawnsites (Fix 3) - Consider adding
full_idas stored field onStepMetadatafor#[instrument]proc macro compatibility
Related
- mando-lib — core crate containing the formatter and workflow engine
- mando-bess — REST API with scheduler and route handlers
- Agent Context — dense agent reference