copy-trade-algorithm
Ranking traders by actual edge (not just lifetime PnL) and (eventually) turning live trades from the watchlist into copy signals. Replaces the “top-50 intersected three ways” wallet picker — that one over-indexes on whales and survivorship.
Layers 1-4 (per-trade edge → per-trader aggregates → recency weighting → composite copy_score) are shipped in the Rust binary. Layer 5 (live trade-signal scoring) is next. Schema and CLI live in polymarket-fetch.
verdict — backtested, no demonstrated edge
Read this before treating the design below as a money-maker.
Backtested the whole thing end to end. As a copy-trade strategy it doesn’t beat picking wallets by raw lifetime PnL. It’s a solid observation tool — not a trading edge. The design below is still valid as a tracker; just don’t copy it expecting profit.
The first backtest looked catastrophic — −78% everywhere. That turned out to be a data bug, not the algorithm: market outcomes were inferred from position snapshots, but winning positions get redeemed and disappear, so the snapshots over-counted losers. Fixed by pulling authoritative resolutions from Polymarket’s Gamma API (61K markets backfilled).
With clean resolution data:
- The gate — the binary filter (not a market-maker, ≥30 resolved trades, positive recent edge) — does beat a random baseline. That part holds up.
- The
copy_scoremagnitude does not. Ranking wallets 1st/2nd/3rd by a precise number doesn’t predict returns. - A persistence diagnostic explains why: a wallet’s edge in one period barely correlates with the next — Spearman ~0.21 vs a luck baseline of ~0.23. Past performance level just doesn’t carry forward. Consistent with the finance literature (Carhart on fund persistence; Akey et al. on Polymarket specifically, finding profit is luck-dominated and only weakly persistent).
- Multi-window backtest (9 cutoffs + walk-forward): the gate’s ROI swings wildly and doesn’t reliably beat the naive “copy the 20 highest all-time-PnL wallets” heuristic, which is far more stable. Most recent forward window was −9%.
| strategy | ROI range across windows |
|---|---|
| gated copy-trade | −22.8% to +61.9% |
| naive top-20 by all-time PnL | +7% to +9% |
The high-return windows are longshot-luck spikes, not skill. So: no edge as a copy strategy. Kept as a tracking tool — it shows what reputable wallets are doing, tracks your own positions, alerts on consensus and changes. The “safe bets” are information, not instructions.
In response, copy_score magnitude was removed from all ranking/ordering. The system now uses a binary recommended/not flag plus a consensus count (how many wallets agree). Every number it surfaces is now backed by evidence or gone.
Worth noting the process more than the result: build → backtest → catch the data bug → fix it → research → adversarially validate the proposed fix (it got rejected) → run a cheap diagnostic before sinking 2 days into a full re-backtest. Every step could have rubber-stamped; none did. The honest negative result is the deliverable.
what it does
Two things, layered:
- Score every watched trader on calibrated, time-decayed performance →
copy_score. - When one of them trades, score the trade itself → emit a signal if it clears a threshold.
how it works
Four layers. Each feeds the next.
Layer 1 — per-trade edge
For each row in pm_trades, once the market resolves:
edge = (exit_value - entry_value) / entry_value
exit_value = the sell price they actually got, or the resolution payout (1.0 for winner, 0 for loser if they held). The point: did they beat the market’s implied probability at the time they entered? Picking a 90c YES that resolves YES isn’t impressive. Picking a 30c YES that resolves YES is.
Layer 2 — per-trader aggregates
Require >=30 resolved trades before a wallet is rankable. Below that it’s noise.
| metric | what |
|---|---|
hit_rate | % of resolved trades that paid more than entry |
mean_edge | avg per-trade outperformance |
edge_sharpe | mean_edge / stddev_edge — main signal, rewards consistency over jackpot hunters |
brier_score | calibration, lower is better, penalizes overconfidence |
median_holding_days | how late we can copy and still catch them |
trades_per_week | sample-size health |
edge_sharpe is the load-bearing one. Brier is the tiebreaker — kills wallets that win loud and lose louder.
Layer 3 — weighting
- Time decay: exponential, half-life 45 days. Sharp 6 months ago is not the same as sharp now.
- Per-category: politics specialist != sports specialist. Compute metrics per category, not just overall.
Layer 4 — composite
copy_score = sample_size_confidence
* edge_sharpe
* calibration_factor (1 / brier)
* recency_weighted_edge
* feasibility_factor (latency + liquidity)
feasibility_factor punishes traders we can’t actually copy — sub-hour holds or markets too thin for our size.
Gate fix shipped: the composite is multiplicative, so negative × negative = positive — a consistently bad trader was ranking high. Now if edge_sharpe <= 0 OR recency_weighted_edge <= 0 the row gets copy_score = NULL, not_recommended = true, and a reason string (negative_sharpe / negative_recency_edge / negative_both). Default ranked-traders filters these out via --include-bad (off by default). See polymarket-fetch for the migration + column shapes.
outputs
- Watchlist — top-N by
copy_scoreper category. Replaces the current top-50 intersection logic in the default selector. - Live signal — when a watchlist trader places a trade, score it:
signal_score = trader.copy_score
* conviction (size relative to their typical)
* market_liquidity
* freshness (seconds since their fill)
* in_specialty_category
Above threshold → Telegram alert, eventually auto-execute.
pitfalls
Market makers will dominate the ranking if not filtered out.
Symptoms: short hold times, balanced buy/sell flow on the same asset, very high trade frequency. They look “sharp” by Brier and hit rate but you can’t copy them — they’re earning spread, not edge. Filter at Layer 2 ingestion.
- Survivorship bias — blown-up wallets stop showing up in leaderboards, so historical aggregates skew rosy. Mitigation: pull from historical leaderboard snapshots, not just current.
- Reflexivity — if we follow them and act, we move the market.
freshnessfactor partly handles it; sizing discipline does the rest.
build phases
| phase | what | status |
|---|---|---|
| 1 | pm_trades fetcher | done |
| 2 | Backfill 6mo trade history for watched wallets | done (capped by Polymarket’s 3000-offset hard limit — see polymarket-fetch) |
| 3 | pm_trader_metrics table + Layer 1+2 metrics | ✓ done |
| 4 | ranked-traders CLI using copy_score | ✓ done |
| 5 | Real-time trade-signal scoring + Telegram alerts | next |
| 6 | Threshold tuning via historical backtest | ongoing |
what’s already in
Rust modules in crates/polymarket-fetch/src/:
main.rs—compute_metrics()(FIFO trade pairing, Brier, edge_sharpe, recency weighting, MM filter, composite + gate)- Telegram delivery via
telegram.rs/notifier.rs(fire-and-forget) — used for milestone notifications, lays the pipe for phase 5
CLI surface:
| cmd | what |
|---|---|
compute-metrics --wallet 0xabc | snapshot one wallet, all categories |
compute-metrics --all | snapshot every wallet × category |
ranked-traders --category politics --limit 20 | read back, default filters MM / bad / insufficient sample |
ranked-traders --include-mm --include-bad --include-insufficient | unfiltered |
DB: pm_trades (FIFO source) + pm_trader_metrics (snapshots), both defined in the tables.
related
- polymarket-fetch — schema, CLI, current pipeline this layers onto