Memray Memory Debugging

Memory profiling recipes for Alpiq BESS Python services using Memray. Integrated into mando-cli as the debug/ recipe group. Supports 7 profiling modes covering container-level profiling, instrumented test scripts, live monitoring, and leak detection.

Status: Implemented (updated 2026-03-15)

Recipe files in poc/recipes/debug/. Accessible from the main menu via “Memory debugging (Memray profiling)” or directly with cargo run -- run debug/memray-container. Debug build includes full DWARF symbols for both Rust and Python. Instrumented test runner at test-data/memray-runner.py provides automatic measurement with structured JSON output.

Profiling Modes

Seven profiling modes are available, selectable via the profile_mode environment variable or the interactive menu.

1. Snapshot

Profiles py_mando import and heavy dependencies (Polars, PyArrow). Generates an allocation flamegraph of the import phase. Useful for measuring baseline memory cost of loading the stack.

2. Endpoint

Auto-detects the service port based on the target container:

  • Forecast containers: port 5000
  • Optimization containers: port 6000

Profiles a single HTTP request to the service. Captures the full allocation trace from request receipt through response, including any Rust/PyO3 boundary crossings.

3. Custom Script

Runs any Python script from the test-cases/ directory through the instrumented test-data/memray-runner.py harness. Full instrumentation is applied automatically (see Instrumented Test Runner below).

4. Live Web

Uses memray’s SocketDestination on port 9001. The profiled process writes allocation data to the socket in real time. Connect a TUI viewer from another shell:

# In another terminal
memray live --port 9001

5. Live Remote

Same as Live Web — server mode for TUI connection. The container exposes a socket that the TUI client connects to for real-time memory inspection.

6. Stats

Generates an allocation histogram showing peak memory usage, total allocation counts, and top allocators by size and frequency. Text-based output suitable for CI or quick terminal inspection.

7. Leak Detection

Always enables trace_python_allocators=True. Performs a GC pass before taking the final snapshot to ensure only genuinely unreleased allocations remain. Generates a --leaks flamegraph showing only allocations that were never freed.

8. Surviving Objects

Uses track_object_lifetimes=True to identify actual Python objects still alive after the tracking window. After the tracker exits, get_surviving_objects() returns a list of live objects grouped by type with counts and sizes. Far more actionable than stack traces — you see the exact objects holding memory.

Output: memray-objects.json with type/count/size breakdown, plus a standard flamegraph.

For Agents

The profile_mode variable accepts these exact values: snapshot, endpoint, script, live-web, live-remote, stats, leaks, objects. The trace_allocators variable accepts True or False (Python capitalization) — the recipe normalizes automatically.

Report Types

Every profiling mode that produces a .bin file now generates multiple report types automatically:

ReportFilePurpose
Flamegraphflamegraph.htmlStandard allocation flamegraph — width = allocation size
Leaksleaks.htmlOnly unfreed allocations (--leaks flag)
Temporaltemporal.htmlMemory over time with selectable time window sliders (--temporal flag). Shows a memory usage chart at the top and a flamegraph for the selected time range below.
Temporary allocationstemp-allocs.htmlIdentifies wasteful alloc/dealloc churn (--temporary-allocations flag). Finds allocations freed after just 1 subsequent allocation — reveals container growth patterns, unnecessary copies, and PyO3 boundary overhead.
Tabletable.txtSorted allocation table — greppable, diffable text output. Good for quick terminal inspection and comparing between runs.
Treetree.txtHierarchical allocation tree — shows call path structure. Good for understanding allocation ownership.

Most useful for leak investigation

Temporal flamegraph is the most valuable report for leak investigation. It lets you scrub through time to see exactly when memory grew and which allocations were responsible at that moment. Combined with the leaks flamegraph, you can distinguish “allocated early and held” from “growing over time.”

Most useful for performance optimization

Temporary allocations report reveals wasteful churn — allocations that are immediately freed. This is the #1 optimization target for PyO3 boundary code where data copies across the Rust/Python boundary create short-lived allocations.

Instrumented Test Runner

The file test-data/memray-runner.py wraps any Python script with automatic measurement instrumentation.

Capabilities

LayerWhat It Measures
memray TrackerNative traces + Python allocator tracing. native_traces=True and trace_python_allocators=True are always enabled when memray is available.
tracemallocPython-level allocation deltas (current/peak/delta).
resource.getrusageRSS (Resident Set Size) tracking before and after execution.

Features

  • Per-script requirements: If a requirements.txt file exists alongside the test script, dependencies are installed automatically before execution.
  • Per-request data merging: The target script can write per-request metrics to /tmp/memray-run-requests.json. The runner merges this data into the final summary.json.
  • Leak heuristic: Detects monotonic RSS growth across requests as an indicator of likely memory leaks.
  • Output directory: All artifacts go to /tmp/memray-run/<run_id>/.

How It Works

test-cases/my-script.py
        │
        ▼
memray-runner.py
  ├── Install requirements.txt (if present)
  ├── Start tracemalloc
  ├── Record RSS (before)
  ├── Start memray Tracker (native_traces + trace_python_allocators)
  │     ├── exec(script)
  │     └── Script writes /tmp/memray-run-requests.json (optional)
  ├── GC + record RSS (after)
  ├── Generate reports from session.bin:
  │     ├── flamegraph.html        — full allocation flamegraph
  │     ├── leaks.html             — unfreed-only flamegraph
  │     ├── temporal.html          — memory over time (selectable slices)
  │     ├── temp-allocs.html       — wasteful alloc/dealloc churn
  │     ├── table.txt              — sorted allocations (greppable)
  │     └── tree.txt               — hierarchical allocation tree
  ├── Merge per-request data
  ├── Detect leak indicators (monotonic RSS growth)
  └── Write summary.json

Stack Signal Collection

Each profiling run collects additional context alongside the memray data:

SignalMethodDetail
WireMock request journalHTTP APIReset before run, dump after. Captures all HTTP calls the service made during profiling.
Container logsdocker logs --since 5mLast 5-minute window from all running services.
Docker container statsdocker stats --no-streamResource usage snapshots taken before and after the profiling run.

All signals are collected into the run output directory alongside the memray artifacts.

Output Structure

Each profiling run produces a self-contained directory:

runs/<scriptname>-<timestamp>/
├── summary.json              — structured report (see schema below)
├── session.bin               — memray binary capture
├── flamegraph.html           — full allocation flamegraph
├── leaks.html                — unfreed-only flamegraph
├── temporal.html             — temporal flamegraph (memory over time, selectable slices)
├── temp-allocs.html          — temporary allocation flamegraph (wasteful churn)
├── table.txt                 — sorted allocation table (greppable/diffable)
├── tree.txt                  — hierarchical allocation tree
├── wiremock-requests.json    — HTTP calls made during run
├── wiremock-unmatched.json   — requests that hit no WireMock stub
├── logs-local-*.txt          — per-service container logs
└── container-stats-*.json    — resource usage before/after

summary.json Schema

{
  "run_id": "example-sequential-requests-20260315-143022",
  "script": "test-cases/example-sequential-requests.py",
  "timestamp": "2026-03-15T14:30:22Z",
  "exit_code": 0,
  "error": null,
  "duration_seconds": 12.4,
  "memory": {
    "rss_before": 45056,
    "rss_after": 78320,
    "rss_delta": 33264,
    "tracemalloc_current": 12480,
    "tracemalloc_peak": 34200,
    "tracemalloc_delta": 12480
  },
  "top_allocators": [
    { "file": "polars/frame.py:142", "size_kb": 8400, "count": 23 }
  ],
  "requests": [
    {
      "request_num": 1,
      "status": 200,
      "duration_s": 3.2,
      "rss_before": 45056,
      "rss_after": 62000,
      "rss_delta": 16944
    }
  ],
  "leak_indicators": {
    "rss_growth_mb": 32.5,
    "monotonic_growth": true,
    "likely_leak": true
  },
  "profiling": {
    "memray_enabled": true,
    "native_traces": true,
    "trace_python_allocators": true
  },
  "artifacts": {
    "session_bin": "runs/example-20260315-143022/session.bin",
    "flamegraph": "runs/example-20260315-143022/flamegraph.html",
    "leaks_flamegraph": "runs/example-20260315-143022/leaks.html",
    "temporal_flamegraph": "runs/example-20260315-143022/temporal.html",
    "temp_allocs_flamegraph": "runs/example-20260315-143022/temp-allocs.html",
    "table_report": "runs/example-20260315-143022/table.txt",
    "tree_report": "runs/example-20260315-143022/tree.txt"
  }
}

For Agents

The requests array is only populated when the target script writes per-request data to /tmp/memray-run-requests.json. The leak_indicators.monotonic_growth boolean is the primary signal — if true across multiple requests, RSS grew on every single request without ever decreasing.

Debug Build (py-mando with Rust Symbols)

The debug build recipe produces a py-mando wheel with full DWARF debug symbols so that memray can resolve Rust function names in flamegraphs.

Build Environment

ComponentDetail
Base imagerust:1.88.0-bookworm
Platform--platform linux/amd64
Rust debugCARGO_PROFILE_RELEASE_DEBUG=2 (full DWARF symbols in .so)
Python debugpython3-dbg + libpython3-dbg for CPython frame visibility
Build depscmake, golang, pkg-config, libsmbclient-dev, libduckdb.so
Maturin flag--skip-auditwheel
CacheBuilt wheel cached at /tmp/pymando-debug/ — reused on subsequent installs

Why Debug Symbols Matter

CARGO_PROFILE_RELEASE_DEBUG=2 embeds full DWARF debug info in the py-mando .so file. This lets memray resolve Rust function names in flamegraphs instead of showing opaque <native> frames. The python3-dbg package provides Python interpreter debug symbols for the same reason.

--skip-auditwheel

Required because libsmbclient (used by py-mando for EBS/Samba access) has system dependencies that auditwheel cannot bundle into the wheel. The wheel is only used inside the container, so portability is not a concern.

CLI Usage

Non-Interactive (with -e flags)

# Profile a custom script with full allocator tracing
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=script \
  -e trace_allocators=True \
  -e script_path=test-cases/example-sequential-requests.py
 
# Snapshot mode — import profiling
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=snapshot
 
# Endpoint mode — single request profile
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=endpoint
 
# Leak detection
cargo run -- run debug/memray-container \
  -e container_target=local-optimization-1 \
  -e profile_mode=leaks
 
# Stats only
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=stats

Compare Two Runs

# Compare two profiling runs — flags regressions, exits 1 if >20% growth
python3 test-data/memray-compare.py runs/run-A/ runs/run-B/

Interactive (via menu)

cargo run -- run menu
# → Memory debugging → choose target → choose mode

Quick Reference

trace_allocators must be True or False (Python capitalization). The recipe normalizes the value automatically, but the input must use this casing.

Profiling Targets

TargetContainerPortNotes
Forecast (bess-forecast-day-ahead)local-forecast-15000Flask/Waitress
Optimization (bess-optimization)local-optimization-16000Flask/Gunicorn, requires Gurobi license
py-mando (PyO3 bindings)Tracks Rust-to-Python boundary allocations
Custom scriptAnyAny .py file via test-cases/

Typical Workflows

Debug a Memory Leak in Forecast

1. Start full stack:     mando run menu → "Mando + Real algos"
2. Install memray:       mando run debug/memray-install → forecast
3. Run leak detection:   cargo run -- run debug/memray-container \
                           -e container_target=local-forecast-1 \
                           -e profile_mode=leaks
4. Review leaks.html in runs/<name>-<timestamp>/
5. Check summary.json → leak_indicators.monotonic_growth

Profile a Custom Test Script

1. Write script:         test-cases/my-test.py
2. (Optional) Add:       test-cases/requirements.txt
3. Run profiling:        cargo run -- run debug/memray-container \
                           -e container_target=local-forecast-1 \
                           -e profile_mode=script \
                           -e script_path=test-cases/my-test.py
4. Artifacts appear in:  runs/my-test-<timestamp>/
5. Open flamegraph.html in browser

Live Monitor During Load Test

1. Start live web mode:  cargo run -- run debug/memray-container \
                           -e container_target=local-forecast-1 \
                           -e profile_mode=live-web
2. In another terminal:  memray live --port 9001
3. Send requests to the service — watch allocations in real time

Detect Memory Leaks Over Multiple Requests

1. Start full stack:     mando run menu → "Mando + Real algos"
2. Install memray:       mando run debug/memray-install → target service
3. Run sequence:         cargo run -- run debug/memray-sequence \
                           -e container_target=local-optimization-1 \
                           -e endpoint_path=/run \
                           -e request_count=20 \
                           -e delay_seconds=2
4. Review temporal.html — scrub timeline to see memory growth per request
5. Check summary.json → leak_indicators.monotonic_growth
6. Compare with baseline: python3 test-data/memray-compare.py runs/baseline/ runs/sequence-<timestamp>/

Profile py-mando PyO3 Boundary

1. Build pymando:        cd ../../mando/py-mando && maturin develop
2. Profile:              mando run debug/memray-local → pymando → flamegraph
3. (Optional) Custom:    Enter your own script path instead of default smoke test

PyO3 Considerations

Memray + PyO3 Boundary

Memray tracks Python-level allocations. For Rust allocations inside py-mando PyO3 extensions, memray will show the Python call site but not the Rust internals unless the debug build is used. With CARGO_PROFILE_RELEASE_DEBUG=2 and native_traces=True, Rust frames become visible in flamegraphs.

Memray is valuable for py-mando because:

  • It tracks allocations at the Python-to-Rust boundary (Arrow/Polars DataFrames)
  • It reveals Python-side memory patterns around PyO3 calls
  • Leak detection shows if Python references to Rust objects are being properly released
  • With the debug build, native Rust allocation frames are fully resolved

Known Limitations

IssueDetailWorkaround
memray attach fails under OrbStack/Rosettagdb/ptrace does not work under emulationUse the Tracker API (all modes except attach use this)
Polars AVX warning under RosettaCosmetic warning about missing AVX instructionsSafe to ignore — not a crash, just slower SIMD fallback
Missing debug symbols after container restartpython3-dbg not persisted across restartsRun the Setup/Install recipe again after each container restart
trace_allocators capitalizationMust be True or False (Python style)Recipe normalizes automatically, but input must match
track_object_lifetimes overheadEnabling object lifetime tracking adds memory and CPU overheadOnly use for targeted debugging, not routine profiling

Architecture

graph TD
    A[mando run debug/menu] --> B{Target?}
    B --> C[Container Profiling]
    B --> D[Local Profiling]
    B --> E[Install Memray + Debug Build]
    B --> F[View Profiles]

    C --> C1[Select service: forecast / optimization]
    C1 --> C2{Profile Mode?}
    C2 --> M1[snapshot]
    C2 --> M2[endpoint]
    C2 --> M3[script]
    C2 --> M4[live-web]
    C2 --> M5[live-remote]
    C2 --> M6[stats]
    C2 --> M7[leaks]
    C2 --> M8[objects]
    C2 --> M9[sequence]

    M3 --> R[memray-runner.py harness]
    R --> R1[memray Tracker]
    R --> R2[tracemalloc]
    R --> R3[resource.getrusage]

    M1 --> G[runs/name-timestamp/]
    M2 --> G
    R --> G
    M4 --> H[Port 9001 SocketDestination]
    M5 --> H
    M6 --> G
    M7 --> G

    G --> G1[summary.json]
    G --> G2[session.bin]
    G --> G3[flamegraph.html]
    G --> G4[leaks.html]
    G --> G7[temporal.html]
    G --> G8[temp-allocs.html]
    G --> G9[table.txt + tree.txt]
    G --> G5[wiremock-requests.json]
    G --> G6[container logs + stats]