Memray Memory Debugging

Memory profiling recipes for Alpiq BESS Python services using Memray. Integrated into mando-cli as the debug/ recipe group. Supports 7 profiling modes covering container-level profiling, instrumented test scripts, live monitoring, and leak detection.

Status: Implemented (updated 2026-03-15)

Recipe files in poc/recipes/debug/. Accessible from the main menu via “Memory debugging (Memray profiling)” or directly with cargo run -- run debug/memray-container. Debug build includes full DWARF symbols for both Rust and Python. Instrumented test runner at test-data/memray-runner.py provides automatic measurement with structured JSON output.

Profiling Modes

Seven profiling modes are available, selectable via the profile_mode environment variable or the interactive menu.

1. Snapshot

Profiles py_mando import and heavy dependencies (Polars, PyArrow). Generates an allocation flamegraph of the import phase. Useful for measuring baseline memory cost of loading the stack.

2. Endpoint

Auto-detects the service port based on the target container:

Forecast containers: port 5000
Optimization containers: port 6000

Profiles a single HTTP request to the service. Captures the full allocation trace from request receipt through response, including any Rust/PyO3 boundary crossings.

3. Custom Script

Runs any Python script from the test-cases/ directory through the instrumented test-data/memray-runner.py harness. Full instrumentation is applied automatically (see Instrumented Test Runner below).

4. Live Web

Uses memray’s SocketDestination on port 9001. The profiled process writes allocation data to the socket in real time. Connect a TUI viewer from another shell:

# In another terminal
memray live --port 9001

5. Live Remote

Same as Live Web — server mode for TUI connection. The container exposes a socket that the TUI client connects to for real-time memory inspection.

6. Stats

Generates an allocation histogram showing peak memory usage, total allocation counts, and top allocators by size and frequency. Text-based output suitable for CI or quick terminal inspection.

7. Leak Detection

Always enables trace_python_allocators=True. Performs a GC pass before taking the final snapshot to ensure only genuinely unreleased allocations remain. Generates a --leaks flamegraph showing only allocations that were never freed.

8. Surviving Objects

Uses track_object_lifetimes=True to identify actual Python objects still alive after the tracking window. After the tracker exits, get_surviving_objects() returns a list of live objects grouped by type with counts and sizes. Far more actionable than stack traces — you see the exact objects holding memory.

Output: memray-objects.json with type/count/size breakdown, plus a standard flamegraph.

For Agents

The profile_mode variable accepts these exact values: snapshot, endpoint, script, live-web, live-remote, stats, leaks, objects. The trace_allocators variable accepts True or False (Python capitalization) — the recipe normalizes automatically.

Report Types

Every profiling mode that produces a .bin file now generates multiple report types automatically:

Report	File	Purpose
Flamegraph	`flamegraph.html`	Standard allocation flamegraph — width = allocation size
Leaks	`leaks.html`	Only unfreed allocations (`--leaks` flag)
Temporal	`temporal.html`	Memory over time with selectable time window sliders (`--temporal` flag). Shows a memory usage chart at the top and a flamegraph for the selected time range below.
Temporary allocations	`temp-allocs.html`	Identifies wasteful alloc/dealloc churn (`--temporary-allocations` flag). Finds allocations freed after just 1 subsequent allocation — reveals container growth patterns, unnecessary copies, and PyO3 boundary overhead.
Table	`table.txt`	Sorted allocation table — greppable, diffable text output. Good for quick terminal inspection and comparing between runs.
Tree	`tree.txt`	Hierarchical allocation tree — shows call path structure. Good for understanding allocation ownership.

Most useful for leak investigation

Temporal flamegraph is the most valuable report for leak investigation. It lets you scrub through time to see exactly when memory grew and which allocations were responsible at that moment. Combined with the leaks flamegraph, you can distinguish “allocated early and held” from “growing over time.”

Most useful for performance optimization

Temporary allocations report reveals wasteful churn — allocations that are immediately freed. This is the #1 optimization target for PyO3 boundary code where data copies across the Rust/Python boundary create short-lived allocations.

Instrumented Test Runner

The file test-data/memray-runner.py wraps any Python script with automatic measurement instrumentation.

Capabilities

Layer	What It Measures
memray Tracker	Native traces + Python allocator tracing. `native_traces=True` and `trace_python_allocators=True` are always enabled when memray is available.
tracemalloc	Python-level allocation deltas (current/peak/delta).
resource.getrusage	RSS (Resident Set Size) tracking before and after execution.

Features

Per-script requirements: If a requirements.txt file exists alongside the test script, dependencies are installed automatically before execution.
Per-request data merging: The target script can write per-request metrics to /tmp/memray-run-requests.json. The runner merges this data into the final summary.json.
Leak heuristic: Detects monotonic RSS growth across requests as an indicator of likely memory leaks.
Output directory: All artifacts go to /tmp/memray-run/<run_id>/.

How It Works

test-cases/my-script.py
        │
        ▼
memray-runner.py
  ├── Install requirements.txt (if present)
  ├── Start tracemalloc
  ├── Record RSS (before)
  ├── Start memray Tracker (native_traces + trace_python_allocators)
  │     ├── exec(script)
  │     └── Script writes /tmp/memray-run-requests.json (optional)
  ├── GC + record RSS (after)
  ├── Generate reports from session.bin:
  │     ├── flamegraph.html        — full allocation flamegraph
  │     ├── leaks.html             — unfreed-only flamegraph
  │     ├── temporal.html          — memory over time (selectable slices)
  │     ├── temp-allocs.html       — wasteful alloc/dealloc churn
  │     ├── table.txt              — sorted allocations (greppable)
  │     └── tree.txt               — hierarchical allocation tree
  ├── Merge per-request data
  ├── Detect leak indicators (monotonic RSS growth)
  └── Write summary.json

Stack Signal Collection

Each profiling run collects additional context alongside the memray data:

Signal	Method	Detail
WireMock request journal	HTTP API	Reset before run, dump after. Captures all HTTP calls the service made during profiling.
Container logs	`docker logs --since 5m`	Last 5-minute window from all running services.
Docker container stats	`docker stats --no-stream`	Resource usage snapshots taken before and after the profiling run.

All signals are collected into the run output directory alongside the memray artifacts.

Output Structure

Each profiling run produces a self-contained directory:

runs/<scriptname>-<timestamp>/
├── summary.json              — structured report (see schema below)
├── session.bin               — memray binary capture
├── flamegraph.html           — full allocation flamegraph
├── leaks.html                — unfreed-only flamegraph
├── temporal.html             — temporal flamegraph (memory over time, selectable slices)
├── temp-allocs.html          — temporary allocation flamegraph (wasteful churn)
├── table.txt                 — sorted allocation table (greppable/diffable)
├── tree.txt                  — hierarchical allocation tree
├── wiremock-requests.json    — HTTP calls made during run
├── wiremock-unmatched.json   — requests that hit no WireMock stub
├── logs-local-*.txt          — per-service container logs
└── container-stats-*.json    — resource usage before/after

`summary.json` Schema

{
  "run_id": "example-sequential-requests-20260315-143022",
  "script": "test-cases/example-sequential-requests.py",
  "timestamp": "2026-03-15T14:30:22Z",
  "exit_code": 0,
  "error": null,
  "duration_seconds": 12.4,
  "memory": {
    "rss_before": 45056,
    "rss_after": 78320,
    "rss_delta": 33264,
    "tracemalloc_current": 12480,
    "tracemalloc_peak": 34200,
    "tracemalloc_delta": 12480
  },
  "top_allocators": [
    { "file": "polars/frame.py:142", "size_kb": 8400, "count": 23 }
  ],
  "requests": [
    {
      "request_num": 1,
      "status": 200,
      "duration_s": 3.2,
      "rss_before": 45056,
      "rss_after": 62000,
      "rss_delta": 16944
    }
  ],
  "leak_indicators": {
    "rss_growth_mb": 32.5,
    "monotonic_growth": true,
    "likely_leak": true
  },
  "profiling": {
    "memray_enabled": true,
    "native_traces": true,
    "trace_python_allocators": true
  },
  "artifacts": {
    "session_bin": "runs/example-20260315-143022/session.bin",
    "flamegraph": "runs/example-20260315-143022/flamegraph.html",
    "leaks_flamegraph": "runs/example-20260315-143022/leaks.html",
    "temporal_flamegraph": "runs/example-20260315-143022/temporal.html",
    "temp_allocs_flamegraph": "runs/example-20260315-143022/temp-allocs.html",
    "table_report": "runs/example-20260315-143022/table.txt",
    "tree_report": "runs/example-20260315-143022/tree.txt"
  }
}

For Agents

The requests array is only populated when the target script writes per-request data to /tmp/memray-run-requests.json. The leak_indicators.monotonic_growth boolean is the primary signal — if true across multiple requests, RSS grew on every single request without ever decreasing.

Debug Build (py-mando with Rust Symbols)

The debug build recipe produces a py-mando wheel with full DWARF debug symbols so that memray can resolve Rust function names in flamegraphs.

Build Environment

Component	Detail
Base image	`rust:1.88.0-bookworm`
Platform	`--platform linux/amd64`
Rust debug	`CARGO_PROFILE_RELEASE_DEBUG=2` (full DWARF symbols in `.so`)
Python debug	`python3-dbg` + `libpython3-dbg` for CPython frame visibility
Build deps	`cmake`, `golang`, `pkg-config`, `libsmbclient-dev`, `libduckdb.so`
Maturin flag	`--skip-auditwheel`
Cache	Built wheel cached at `/tmp/pymando-debug/` — reused on subsequent installs

Why Debug Symbols Matter

CARGO_PROFILE_RELEASE_DEBUG=2 embeds full DWARF debug info in the py-mando .so file. This lets memray resolve Rust function names in flamegraphs instead of showing opaque <native> frames. The python3-dbg package provides Python interpreter debug symbols for the same reason.

--skip-auditwheel

Required because libsmbclient (used by py-mando for EBS/Samba access) has system dependencies that auditwheel cannot bundle into the wheel. The wheel is only used inside the container, so portability is not a concern.

CLI Usage

Non-Interactive (with `-e` flags)

# Profile a custom script with full allocator tracing
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=script \
  -e trace_allocators=True \
  -e script_path=test-cases/example-sequential-requests.py
 
# Snapshot mode — import profiling
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=snapshot
 
# Endpoint mode — single request profile
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=endpoint
 
# Leak detection
cargo run -- run debug/memray-container \
  -e container_target=local-optimization-1 \
  -e profile_mode=leaks
 
# Stats only
cargo run -- run debug/memray-container \
  -e container_target=local-forecast-1 \
  -e profile_mode=stats

Compare Two Runs

# Compare two profiling runs — flags regressions, exits 1 if >20% growth
python3 test-data/memray-compare.py runs/run-A/ runs/run-B/

cargo run -- run menu
# → Memory debugging → choose target → choose mode

Quick Reference

trace_allocators must be True or False (Python capitalization). The recipe normalizes the value automatically, but the input must use this casing.

Profiling Targets

Target	Container	Port	Notes
Forecast (`bess-forecast-day-ahead`)	`local-forecast-1`	5000	Flask/Waitress
Optimization (`bess-optimization`)	`local-optimization-1`	6000	Flask/Gunicorn, requires Gurobi license
py-mando (PyO3 bindings)	—	—	Tracks Rust-to-Python boundary allocations
Custom script	Any	—	Any `.py` file via `test-cases/`

Typical Workflows

Debug a Memory Leak in Forecast

1. Start full stack:     mando run menu → "Mando + Real algos"
2. Install memray:       mando run debug/memray-install → forecast
3. Run leak detection:   cargo run -- run debug/memray-container \
                           -e container_target=local-forecast-1 \
                           -e profile_mode=leaks
4. Review leaks.html in runs/<name>-<timestamp>/
5. Check summary.json → leak_indicators.monotonic_growth

Profile a Custom Test Script

1. Write script:         test-cases/my-test.py
2. (Optional) Add:       test-cases/requirements.txt
3. Run profiling:        cargo run -- run debug/memray-container \
                           -e container_target=local-forecast-1 \
                           -e profile_mode=script \
                           -e script_path=test-cases/my-test.py
4. Artifacts appear in:  runs/my-test-<timestamp>/
5. Open flamegraph.html in browser

Live Monitor During Load Test

1. Start live web mode:  cargo run -- run debug/memray-container \
                           -e container_target=local-forecast-1 \
                           -e profile_mode=live-web
2. In another terminal:  memray live --port 9001
3. Send requests to the service — watch allocations in real time

Detect Memory Leaks Over Multiple Requests

1. Start full stack:     mando run menu → "Mando + Real algos"
2. Install memray:       mando run debug/memray-install → target service
3. Run sequence:         cargo run -- run debug/memray-sequence \
                           -e container_target=local-optimization-1 \
                           -e endpoint_path=/run \
                           -e request_count=20 \
                           -e delay_seconds=2
4. Review temporal.html — scrub timeline to see memory growth per request
5. Check summary.json → leak_indicators.monotonic_growth
6. Compare with baseline: python3 test-data/memray-compare.py runs/baseline/ runs/sequence-<timestamp>/

Profile py-mando PyO3 Boundary

1. Build pymando:        cd ../../mando/py-mando && maturin develop
2. Profile:              mando run debug/memray-local → pymando → flamegraph
3. (Optional) Custom:    Enter your own script path instead of default smoke test

PyO3 Considerations

Memray + PyO3 Boundary

Memray tracks Python-level allocations. For Rust allocations inside py-mando PyO3 extensions, memray will show the Python call site but not the Rust internals unless the debug build is used. With CARGO_PROFILE_RELEASE_DEBUG=2 and native_traces=True, Rust frames become visible in flamegraphs.

Memray is valuable for py-mando because:

It tracks allocations at the Python-to-Rust boundary (Arrow/Polars DataFrames)
It reveals Python-side memory patterns around PyO3 calls
Leak detection shows if Python references to Rust objects are being properly released
With the debug build, native Rust allocation frames are fully resolved

Known Limitations

Issue	Detail	Workaround
`memray attach` fails under OrbStack/Rosetta	gdb/ptrace does not work under emulation	Use the Tracker API (all modes except attach use this)
Polars AVX warning under Rosetta	Cosmetic warning about missing AVX instructions	Safe to ignore — not a crash, just slower SIMD fallback
Missing debug symbols after container restart	`python3-dbg` not persisted across restarts	Run the Setup/Install recipe again after each container restart
`trace_allocators` capitalization	Must be `True` or `False` (Python style)	Recipe normalizes automatically, but input must match
`track_object_lifetimes` overhead	Enabling object lifetime tracking adds memory and CPU overhead	Only use for targeted debugging, not routine profiling

Architecture

graph TD
    A[mando run debug/menu] --> B{Target?}
    B --> C[Container Profiling]
    B --> D[Local Profiling]
    B --> E[Install Memray + Debug Build]
    B --> F[View Profiles]

    C --> C1[Select service: forecast / optimization]
    C1 --> C2{Profile Mode?}
    C2 --> M1[snapshot]
    C2 --> M2[endpoint]
    C2 --> M3[script]
    C2 --> M4[live-web]
    C2 --> M5[live-remote]
    C2 --> M6[stats]
    C2 --> M7[leaks]
    C2 --> M8[objects]
    C2 --> M9[sequence]

    M3 --> R[memray-runner.py harness]
    R --> R1[memray Tracker]
    R --> R2[tracemalloc]
    R --> R3[resource.getrusage]

    M1 --> G[runs/name-timestamp/]
    M2 --> G
    R --> G
    M4 --> H[Port 9001 SocketDestination]
    M5 --> H
    M6 --> G
    M7 --> G

    G --> G1[summary.json]
    G --> G2[session.bin]
    G --> G3[flamegraph.html]
    G --> G4[leaks.html]
    G --> G7[temporal.html]
    G --> G8[temp-allocs.html]
    G --> G9[table.txt + tree.txt]
    G --> G5[wiremock-requests.json]
    G --> G6[container logs + stats]

mando-cli — the recipe engine that runs these workflows
Local Docker Setup — the Docker Compose stack being profiled
py-mando — PyO3 bindings (profiling target)
Alpiq BESS — project overview
Mando — workspace overview

Levandor

Explorer

Memray Memory Debugging

Memray Memory Debugging

Profiling Modes

1. Snapshot

2. Endpoint

3. Custom Script

4. Live Web

5. Live Remote

6. Stats

7. Leak Detection

8. Surviving Objects

Report Types

Instrumented Test Runner

Capabilities

Features

How It Works

Stack Signal Collection

Output Structure

`summary.json` Schema

Debug Build (py-mando with Rust Symbols)

Build Environment

CLI Usage

Non-Interactive (with `-e` flags)

Compare Two Runs

Interactive (via menu)

Profiling Targets

Typical Workflows

Debug a Memory Leak in Forecast

Profile a Custom Test Script

Live Monitor During Load Test

Detect Memory Leaks Over Multiple Requests

Profile py-mando PyO3 Boundary

PyO3 Considerations

Known Limitations

Architecture

Graph View

Table of Contents

Levandor

Explorer

Memray Memory Debugging

Memray Memory Debugging

Profiling Modes

1. Snapshot

2. Endpoint

3. Custom Script

4. Live Web

5. Live Remote

6. Stats

7. Leak Detection

8. Surviving Objects

Report Types

Instrumented Test Runner

Capabilities

Features

How It Works

Stack Signal Collection

Output Structure

summary.json Schema

Debug Build (py-mando with Rust Symbols)

Build Environment

CLI Usage

Non-Interactive (with -e flags)

Compare Two Runs

Interactive (via menu)

Profiling Targets

Typical Workflows

Debug a Memory Leak in Forecast

Profile a Custom Test Script

Live Monitor During Load Test

Detect Memory Leaks Over Multiple Requests

Profile py-mando PyO3 Boundary

PyO3 Considerations

Known Limitations

Architecture

Related

Graph View

Table of Contents

`summary.json` Schema

Non-Interactive (with `-e` flags)