copy-trade-algorithm

Ranking traders by actual edge (not just lifetime PnL) and (eventually) turning live trades from the watchlist into copy signals. Replaces the “top-50 intersected three ways” wallet picker — that one over-indexes on whales and survivorship.

Layers 1-4 (per-trade edge → per-trader aggregates → recency weighting → composite copy_score) are shipped in the Rust binary. Layer 5 (live trade-signal scoring) is next. Schema and CLI live in polymarket-fetch.

verdict — backtested, no demonstrated edge

Read this before treating the design below as a money-maker.

Backtested the whole thing end to end. As a copy-trade strategy it doesn’t beat picking wallets by raw lifetime PnL. It’s a solid observation tool — not a trading edge. The design below is still valid as a tracker; just don’t copy it expecting profit.

The first backtest looked catastrophic — −78% everywhere. That turned out to be a data bug, not the algorithm: market outcomes were inferred from position snapshots, but winning positions get redeemed and disappear, so the snapshots over-counted losers. Fixed by pulling authoritative resolutions from Polymarket’s Gamma API (61K markets backfilled).

With clean resolution data:

  • The gate — the binary filter (not a market-maker, ≥30 resolved trades, positive recent edge) — does beat a random baseline. That part holds up.
  • The copy_score magnitude does not. Ranking wallets 1st/2nd/3rd by a precise number doesn’t predict returns.
  • A persistence diagnostic explains why: a wallet’s edge in one period barely correlates with the next — Spearman ~0.21 vs a luck baseline of ~0.23. Past performance level just doesn’t carry forward. Consistent with the finance literature (Carhart on fund persistence; Akey et al. on Polymarket specifically, finding profit is luck-dominated and only weakly persistent).
  • Multi-window backtest (9 cutoffs + walk-forward): the gate’s ROI swings wildly and doesn’t reliably beat the naive “copy the 20 highest all-time-PnL wallets” heuristic, which is far more stable. Most recent forward window was −9%.
strategyROI range across windows
gated copy-trade−22.8% to +61.9%
naive top-20 by all-time PnL+7% to +9%

The high-return windows are longshot-luck spikes, not skill. So: no edge as a copy strategy. Kept as a tracking tool — it shows what reputable wallets are doing, tracks your own positions, alerts on consensus and changes. The “safe bets” are information, not instructions.

In response, copy_score magnitude was removed from all ranking/ordering. The system now uses a binary recommended/not flag plus a consensus count (how many wallets agree). Every number it surfaces is now backed by evidence or gone.

Worth noting the process more than the result: build → backtest → catch the data bug → fix it → research → adversarially validate the proposed fix (it got rejected) → run a cheap diagnostic before sinking 2 days into a full re-backtest. Every step could have rubber-stamped; none did. The honest negative result is the deliverable.

what it does

Two things, layered:

  1. Score every watched trader on calibrated, time-decayed performance → copy_score.
  2. When one of them trades, score the trade itself → emit a signal if it clears a threshold.

how it works

Four layers. Each feeds the next.

Layer 1 — per-trade edge

For each row in pm_trades, once the market resolves:

edge = (exit_value - entry_value) / entry_value

exit_value = the sell price they actually got, or the resolution payout (1.0 for winner, 0 for loser if they held). The point: did they beat the market’s implied probability at the time they entered? Picking a 90c YES that resolves YES isn’t impressive. Picking a 30c YES that resolves YES is.

Layer 2 — per-trader aggregates

Require >=30 resolved trades before a wallet is rankable. Below that it’s noise.

metricwhat
hit_rate% of resolved trades that paid more than entry
mean_edgeavg per-trade outperformance
edge_sharpemean_edge / stddev_edge — main signal, rewards consistency over jackpot hunters
brier_scorecalibration, lower is better, penalizes overconfidence
median_holding_dayshow late we can copy and still catch them
trades_per_weeksample-size health

edge_sharpe is the load-bearing one. Brier is the tiebreaker — kills wallets that win loud and lose louder.

Layer 3 — weighting

  • Time decay: exponential, half-life 45 days. Sharp 6 months ago is not the same as sharp now.
  • Per-category: politics specialist != sports specialist. Compute metrics per category, not just overall.

Layer 4 — composite

copy_score = sample_size_confidence
           * edge_sharpe
           * calibration_factor (1 / brier)
           * recency_weighted_edge
           * feasibility_factor (latency + liquidity)

feasibility_factor punishes traders we can’t actually copy — sub-hour holds or markets too thin for our size.

Gate fix shipped: the composite is multiplicative, so negative × negative = positive — a consistently bad trader was ranking high. Now if edge_sharpe <= 0 OR recency_weighted_edge <= 0 the row gets copy_score = NULL, not_recommended = true, and a reason string (negative_sharpe / negative_recency_edge / negative_both). Default ranked-traders filters these out via --include-bad (off by default). See polymarket-fetch for the migration + column shapes.

outputs

  • Watchlist — top-N by copy_score per category. Replaces the current top-50 intersection logic in the default selector.
  • Live signal — when a watchlist trader places a trade, score it:
signal_score = trader.copy_score
             * conviction (size relative to their typical)
             * market_liquidity
             * freshness (seconds since their fill)
             * in_specialty_category

Above threshold → Telegram alert, eventually auto-execute.

pitfalls

Market makers will dominate the ranking if not filtered out.

Symptoms: short hold times, balanced buy/sell flow on the same asset, very high trade frequency. They look “sharp” by Brier and hit rate but you can’t copy them — they’re earning spread, not edge. Filter at Layer 2 ingestion.

  • Survivorship bias — blown-up wallets stop showing up in leaderboards, so historical aggregates skew rosy. Mitigation: pull from historical leaderboard snapshots, not just current.
  • Reflexivity — if we follow them and act, we move the market. freshness factor partly handles it; sizing discipline does the rest.

build phases

phasewhatstatus
1pm_trades fetcherdone
2Backfill 6mo trade history for watched walletsdone (capped by Polymarket’s 3000-offset hard limit — see polymarket-fetch)
3pm_trader_metrics table + Layer 1+2 metrics✓ done
4ranked-traders CLI using copy_score✓ done
5Real-time trade-signal scoring + Telegram alertsnext
6Threshold tuning via historical backtestongoing

what’s already in

Rust modules in crates/polymarket-fetch/src/:

  • main.rscompute_metrics() (FIFO trade pairing, Brier, edge_sharpe, recency weighting, MM filter, composite + gate)
  • Telegram delivery via telegram.rs / notifier.rs (fire-and-forget) — used for milestone notifications, lays the pipe for phase 5

CLI surface:

cmdwhat
compute-metrics --wallet 0xabcsnapshot one wallet, all categories
compute-metrics --allsnapshot every wallet × category
ranked-traders --category politics --limit 20read back, default filters MM / bad / insufficient sample
ranked-traders --include-mm --include-bad --include-insufficientunfiltered

DB: pm_trades (FIFO source) + pm_trader_metrics (snapshots), both defined in the tables.