Memray Memory Debugging
Memory profiling recipes for Alpiq BESS Python services using Memray. Integrated into mando-cli as the debug/ recipe group. Supports 7 profiling modes covering container-level profiling, instrumented test scripts, live monitoring, and leak detection.
Status: Implemented (updated 2026-03-15)
Recipe files in
poc/recipes/debug/. Accessible from the main menu via “Memory debugging (Memray profiling)” or directly withcargo run -- run debug/memray-container. Debug build includes full DWARF symbols for both Rust and Python. Instrumented test runner attest-data/memray-runner.pyprovides automatic measurement with structured JSON output.
Profiling Modes
Seven profiling modes are available, selectable via the profile_mode environment variable or the interactive menu.
1. Snapshot
Profiles py_mando import and heavy dependencies (Polars, PyArrow). Generates an allocation flamegraph of the import phase. Useful for measuring baseline memory cost of loading the stack.
2. Endpoint
Auto-detects the service port based on the target container:
- Forecast containers: port
5000 - Optimization containers: port
6000
Profiles a single HTTP request to the service. Captures the full allocation trace from request receipt through response, including any Rust/PyO3 boundary crossings.
3. Custom Script
Runs any Python script from the test-cases/ directory through the instrumented test-data/memray-runner.py harness. Full instrumentation is applied automatically (see Instrumented Test Runner below).
4. Live Web
Uses memray’s SocketDestination on port 9001. The profiled process writes allocation data to the socket in real time. Connect a TUI viewer from another shell:
# In another terminal
memray live --port 90015. Live Remote
Same as Live Web — server mode for TUI connection. The container exposes a socket that the TUI client connects to for real-time memory inspection.
6. Stats
Generates an allocation histogram showing peak memory usage, total allocation counts, and top allocators by size and frequency. Text-based output suitable for CI or quick terminal inspection.
7. Leak Detection
Always enables trace_python_allocators=True. Performs a GC pass before taking the final snapshot to ensure only genuinely unreleased allocations remain. Generates a --leaks flamegraph showing only allocations that were never freed.
8. Surviving Objects
Uses track_object_lifetimes=True to identify actual Python objects still alive after the tracking window. After the tracker exits, get_surviving_objects() returns a list of live objects grouped by type with counts and sizes. Far more actionable than stack traces — you see the exact objects holding memory.
Output: memray-objects.json with type/count/size breakdown, plus a standard flamegraph.
For Agents
The
profile_modevariable accepts these exact values:snapshot,endpoint,script,live-web,live-remote,stats,leaks,objects. Thetrace_allocatorsvariable acceptsTrueorFalse(Python capitalization) — the recipe normalizes automatically.
Report Types
Every profiling mode that produces a .bin file now generates multiple report types automatically:
| Report | File | Purpose |
|---|---|---|
| Flamegraph | flamegraph.html | Standard allocation flamegraph — width = allocation size |
| Leaks | leaks.html | Only unfreed allocations (--leaks flag) |
| Temporal | temporal.html | Memory over time with selectable time window sliders (--temporal flag). Shows a memory usage chart at the top and a flamegraph for the selected time range below. |
| Temporary allocations | temp-allocs.html | Identifies wasteful alloc/dealloc churn (--temporary-allocations flag). Finds allocations freed after just 1 subsequent allocation — reveals container growth patterns, unnecessary copies, and PyO3 boundary overhead. |
| Table | table.txt | Sorted allocation table — greppable, diffable text output. Good for quick terminal inspection and comparing between runs. |
| Tree | tree.txt | Hierarchical allocation tree — shows call path structure. Good for understanding allocation ownership. |
Most useful for leak investigation
Temporal flamegraph is the most valuable report for leak investigation. It lets you scrub through time to see exactly when memory grew and which allocations were responsible at that moment. Combined with the leaks flamegraph, you can distinguish “allocated early and held” from “growing over time.”
Most useful for performance optimization
Temporary allocations report reveals wasteful churn — allocations that are immediately freed. This is the #1 optimization target for PyO3 boundary code where data copies across the Rust/Python boundary create short-lived allocations.
Instrumented Test Runner
The file test-data/memray-runner.py wraps any Python script with automatic measurement instrumentation.
Capabilities
| Layer | What It Measures |
|---|---|
| memray Tracker | Native traces + Python allocator tracing. native_traces=True and trace_python_allocators=True are always enabled when memray is available. |
| tracemalloc | Python-level allocation deltas (current/peak/delta). |
| resource.getrusage | RSS (Resident Set Size) tracking before and after execution. |
Features
- Per-script requirements: If a
requirements.txtfile exists alongside the test script, dependencies are installed automatically before execution. - Per-request data merging: The target script can write per-request metrics to
/tmp/memray-run-requests.json. The runner merges this data into the finalsummary.json. - Leak heuristic: Detects monotonic RSS growth across requests as an indicator of likely memory leaks.
- Output directory: All artifacts go to
/tmp/memray-run/<run_id>/.
How It Works
test-cases/my-script.py
│
▼
memray-runner.py
├── Install requirements.txt (if present)
├── Start tracemalloc
├── Record RSS (before)
├── Start memray Tracker (native_traces + trace_python_allocators)
│ ├── exec(script)
│ └── Script writes /tmp/memray-run-requests.json (optional)
├── GC + record RSS (after)
├── Generate reports from session.bin:
│ ├── flamegraph.html — full allocation flamegraph
│ ├── leaks.html — unfreed-only flamegraph
│ ├── temporal.html — memory over time (selectable slices)
│ ├── temp-allocs.html — wasteful alloc/dealloc churn
│ ├── table.txt — sorted allocations (greppable)
│ └── tree.txt — hierarchical allocation tree
├── Merge per-request data
├── Detect leak indicators (monotonic RSS growth)
└── Write summary.json
Stack Signal Collection
Each profiling run collects additional context alongside the memray data:
| Signal | Method | Detail |
|---|---|---|
| WireMock request journal | HTTP API | Reset before run, dump after. Captures all HTTP calls the service made during profiling. |
| Container logs | docker logs --since 5m | Last 5-minute window from all running services. |
| Docker container stats | docker stats --no-stream | Resource usage snapshots taken before and after the profiling run. |
All signals are collected into the run output directory alongside the memray artifacts.
Output Structure
Each profiling run produces a self-contained directory:
runs/<scriptname>-<timestamp>/
├── summary.json — structured report (see schema below)
├── session.bin — memray binary capture
├── flamegraph.html — full allocation flamegraph
├── leaks.html — unfreed-only flamegraph
├── temporal.html — temporal flamegraph (memory over time, selectable slices)
├── temp-allocs.html — temporary allocation flamegraph (wasteful churn)
├── table.txt — sorted allocation table (greppable/diffable)
├── tree.txt — hierarchical allocation tree
├── wiremock-requests.json — HTTP calls made during run
├── wiremock-unmatched.json — requests that hit no WireMock stub
├── logs-local-*.txt — per-service container logs
└── container-stats-*.json — resource usage before/after
summary.json Schema
{
"run_id": "example-sequential-requests-20260315-143022",
"script": "test-cases/example-sequential-requests.py",
"timestamp": "2026-03-15T14:30:22Z",
"exit_code": 0,
"error": null,
"duration_seconds": 12.4,
"memory": {
"rss_before": 45056,
"rss_after": 78320,
"rss_delta": 33264,
"tracemalloc_current": 12480,
"tracemalloc_peak": 34200,
"tracemalloc_delta": 12480
},
"top_allocators": [
{ "file": "polars/frame.py:142", "size_kb": 8400, "count": 23 }
],
"requests": [
{
"request_num": 1,
"status": 200,
"duration_s": 3.2,
"rss_before": 45056,
"rss_after": 62000,
"rss_delta": 16944
}
],
"leak_indicators": {
"rss_growth_mb": 32.5,
"monotonic_growth": true,
"likely_leak": true
},
"profiling": {
"memray_enabled": true,
"native_traces": true,
"trace_python_allocators": true
},
"artifacts": {
"session_bin": "runs/example-20260315-143022/session.bin",
"flamegraph": "runs/example-20260315-143022/flamegraph.html",
"leaks_flamegraph": "runs/example-20260315-143022/leaks.html",
"temporal_flamegraph": "runs/example-20260315-143022/temporal.html",
"temp_allocs_flamegraph": "runs/example-20260315-143022/temp-allocs.html",
"table_report": "runs/example-20260315-143022/table.txt",
"tree_report": "runs/example-20260315-143022/tree.txt"
}
}For Agents
The
requestsarray is only populated when the target script writes per-request data to/tmp/memray-run-requests.json. Theleak_indicators.monotonic_growthboolean is the primary signal — iftrueacross multiple requests, RSS grew on every single request without ever decreasing.
Debug Build (py-mando with Rust Symbols)
The debug build recipe produces a py-mando wheel with full DWARF debug symbols so that memray can resolve Rust function names in flamegraphs.
Build Environment
| Component | Detail |
|---|---|
| Base image | rust:1.88.0-bookworm |
| Platform | --platform linux/amd64 |
| Rust debug | CARGO_PROFILE_RELEASE_DEBUG=2 (full DWARF symbols in .so) |
| Python debug | python3-dbg + libpython3-dbg for CPython frame visibility |
| Build deps | cmake, golang, pkg-config, libsmbclient-dev, libduckdb.so |
| Maturin flag | --skip-auditwheel |
| Cache | Built wheel cached at /tmp/pymando-debug/ — reused on subsequent installs |
Why Debug Symbols Matter
CARGO_PROFILE_RELEASE_DEBUG=2embeds full DWARF debug info in the py-mando.sofile. This lets memray resolve Rust function names in flamegraphs instead of showing opaque<native>frames. Thepython3-dbgpackage provides Python interpreter debug symbols for the same reason.
--skip-auditwheelRequired because
libsmbclient(used by py-mando for EBS/Samba access) has system dependencies thatauditwheelcannot bundle into the wheel. The wheel is only used inside the container, so portability is not a concern.
CLI Usage
Non-Interactive (with -e flags)
# Profile a custom script with full allocator tracing
cargo run -- run debug/memray-container \
-e container_target=local-forecast-1 \
-e profile_mode=script \
-e trace_allocators=True \
-e script_path=test-cases/example-sequential-requests.py
# Snapshot mode — import profiling
cargo run -- run debug/memray-container \
-e container_target=local-forecast-1 \
-e profile_mode=snapshot
# Endpoint mode — single request profile
cargo run -- run debug/memray-container \
-e container_target=local-forecast-1 \
-e profile_mode=endpoint
# Leak detection
cargo run -- run debug/memray-container \
-e container_target=local-optimization-1 \
-e profile_mode=leaks
# Stats only
cargo run -- run debug/memray-container \
-e container_target=local-forecast-1 \
-e profile_mode=statsCompare Two Runs
# Compare two profiling runs — flags regressions, exits 1 if >20% growth
python3 test-data/memray-compare.py runs/run-A/ runs/run-B/Interactive (via menu)
cargo run -- run menu
# → Memory debugging → choose target → choose modeQuick Reference
trace_allocatorsmust beTrueorFalse(Python capitalization). The recipe normalizes the value automatically, but the input must use this casing.
Profiling Targets
| Target | Container | Port | Notes |
|---|---|---|---|
Forecast (bess-forecast-day-ahead) | local-forecast-1 | 5000 | Flask/Waitress |
Optimization (bess-optimization) | local-optimization-1 | 6000 | Flask/Gunicorn, requires Gurobi license |
| py-mando (PyO3 bindings) | — | — | Tracks Rust-to-Python boundary allocations |
| Custom script | Any | — | Any .py file via test-cases/ |
Typical Workflows
Debug a Memory Leak in Forecast
1. Start full stack: mando run menu → "Mando + Real algos"
2. Install memray: mando run debug/memray-install → forecast
3. Run leak detection: cargo run -- run debug/memray-container \
-e container_target=local-forecast-1 \
-e profile_mode=leaks
4. Review leaks.html in runs/<name>-<timestamp>/
5. Check summary.json → leak_indicators.monotonic_growth
Profile a Custom Test Script
1. Write script: test-cases/my-test.py
2. (Optional) Add: test-cases/requirements.txt
3. Run profiling: cargo run -- run debug/memray-container \
-e container_target=local-forecast-1 \
-e profile_mode=script \
-e script_path=test-cases/my-test.py
4. Artifacts appear in: runs/my-test-<timestamp>/
5. Open flamegraph.html in browser
Live Monitor During Load Test
1. Start live web mode: cargo run -- run debug/memray-container \
-e container_target=local-forecast-1 \
-e profile_mode=live-web
2. In another terminal: memray live --port 9001
3. Send requests to the service — watch allocations in real time
Detect Memory Leaks Over Multiple Requests
1. Start full stack: mando run menu → "Mando + Real algos"
2. Install memray: mando run debug/memray-install → target service
3. Run sequence: cargo run -- run debug/memray-sequence \
-e container_target=local-optimization-1 \
-e endpoint_path=/run \
-e request_count=20 \
-e delay_seconds=2
4. Review temporal.html — scrub timeline to see memory growth per request
5. Check summary.json → leak_indicators.monotonic_growth
6. Compare with baseline: python3 test-data/memray-compare.py runs/baseline/ runs/sequence-<timestamp>/
Profile py-mando PyO3 Boundary
1. Build pymando: cd ../../mando/py-mando && maturin develop
2. Profile: mando run debug/memray-local → pymando → flamegraph
3. (Optional) Custom: Enter your own script path instead of default smoke test
PyO3 Considerations
Memray + PyO3 Boundary
Memray tracks Python-level allocations. For Rust allocations inside py-mando PyO3 extensions, memray will show the Python call site but not the Rust internals unless the debug build is used. With
CARGO_PROFILE_RELEASE_DEBUG=2andnative_traces=True, Rust frames become visible in flamegraphs.
Memray is valuable for py-mando because:
- It tracks allocations at the Python-to-Rust boundary (Arrow/Polars DataFrames)
- It reveals Python-side memory patterns around PyO3 calls
- Leak detection shows if Python references to Rust objects are being properly released
- With the debug build, native Rust allocation frames are fully resolved
Known Limitations
| Issue | Detail | Workaround |
|---|---|---|
memray attach fails under OrbStack/Rosetta | gdb/ptrace does not work under emulation | Use the Tracker API (all modes except attach use this) |
| Polars AVX warning under Rosetta | Cosmetic warning about missing AVX instructions | Safe to ignore — not a crash, just slower SIMD fallback |
| Missing debug symbols after container restart | python3-dbg not persisted across restarts | Run the Setup/Install recipe again after each container restart |
trace_allocators capitalization | Must be True or False (Python style) | Recipe normalizes automatically, but input must match |
track_object_lifetimes overhead | Enabling object lifetime tracking adds memory and CPU overhead | Only use for targeted debugging, not routine profiling |
Architecture
graph TD A[mando run debug/menu] --> B{Target?} B --> C[Container Profiling] B --> D[Local Profiling] B --> E[Install Memray + Debug Build] B --> F[View Profiles] C --> C1[Select service: forecast / optimization] C1 --> C2{Profile Mode?} C2 --> M1[snapshot] C2 --> M2[endpoint] C2 --> M3[script] C2 --> M4[live-web] C2 --> M5[live-remote] C2 --> M6[stats] C2 --> M7[leaks] C2 --> M8[objects] C2 --> M9[sequence] M3 --> R[memray-runner.py harness] R --> R1[memray Tracker] R --> R2[tracemalloc] R --> R3[resource.getrusage] M1 --> G[runs/name-timestamp/] M2 --> G R --> G M4 --> H[Port 9001 SocketDestination] M5 --> H M6 --> G M7 --> G G --> G1[summary.json] G --> G2[session.bin] G --> G3[flamegraph.html] G --> G4[leaks.html] G --> G7[temporal.html] G --> G8[temp-allocs.html] G --> G9[table.txt + tree.txt] G --> G5[wiremock-requests.json] G --> G6[container logs + stats]
Related
- mando-cli — the recipe engine that runs these workflows
- Local Docker Setup — the Docker Compose stack being profiled
- py-mando — PyO3 bindings (profiling target)
- Alpiq BESS — project overview
- Mando — workspace overview