Cross-exchange arbitrage

Per-minute cross-exchange spread matrix for one market across many venues — the canonical "find the price gap" demo.

Query

WITH
    '2024-12-31' AS day,
    p AS (
        SELECT start, exchange, toFloat64(close) AS close
        FROM api.ohlcv(candle_duration_in_minutes = 1)
        WHERE market = 'BTC-USDT'
          AND exchange IN ('binance', 'okx', 'kucoin', 'gateio')
          AND start >= toDateTime(day)
          AND start <  toDateTime(day) + INTERVAL 1 DAY
    )
SELECT a.start,
       a.exchange AS buy_ex,
       b.exchange AS sell_ex,
       a.close AS buy_price,
       b.close AS sell_price,
       (b.close - a.close) / a.close * 100 AS spread_pct
FROM p a
JOIN p b ON a.start = b.start
WHERE a.exchange < b.exchange
ORDER BY a.start, buy_ex, sell_ex

Two SQL details worth flagging:

  • a.exchange < b.exchange is the standard ordered-pair trick to avoid double-counting (binance, okx) and (okx, binance). With 4 exchanges you get C(4, 2) = 6 ordered pairs, not 16 of the unordered-and-self-joined Cartesian product.

  • No threaded fetcher. api.ohlcv(...) returns simultaneous cross-exchange snapshots by virtue of the WHERE clause — every row for start = T is for the same T. The repo's threading exists only because each REST call is independent and synchronous.

Output (2024-12-31)

8,640 rows = 1,440 minutes × 6 ordered pairs. Top 8 absolute spreads of the day:

start
buy_ex
sell_ex
buy_price
sell_price
spread_pct

2024-12-31 16:54:00

binance

kucoin

94903.65

95000.00

+0.102

2024-12-31 16:54:00

gateio

kucoin

94904.20

95000.00

+0.101

2024-12-31 13:40:00

kucoin

okx

95561.60

95656.00

+0.099

2024-12-31 16:54:00

kucoin

okx

95000.00

94906.60

−0.098

2024-12-31 13:36:00

kucoin

okx

95600.00

95693.60

+0.098

2024-12-31 13:35:00

kucoin

okx

95600.00

95691.80

+0.096

2024-12-31 13:36:00

binance

kucoin

95688.98

95600.00

−0.093

2024-12-31 13:50:00

binance

kucoin

95891.73

95804.20

−0.091

A few clusters of ~0.1 % gaps — well above retail trading-fee breakeven (~0.04 % each side), but small enough that the venues re-arbed within a minute or two. The mean / std distribution per pair tells the same story:

pair
mean
std
min
max

binance ↔ gateio

−0.010

0.010

−0.043

+0.047

binance ↔ kucoin

−0.003

0.012

−0.093

+0.102

binance ↔ okx

0.000

0.008

−0.025

+0.033

gateio ↔ kucoin

+0.007

0.013

−0.085

+0.101

gateio ↔ okx

+0.010

0.011

−0.036

+0.057

kucoin ↔ okx

+0.003

0.012

−0.098

+0.099

kucoin is the noisiest leg — its standard deviation against every other venue is the largest, and it's the buy/sell side of every top-8 spread row.

Extending it

  • More exchanges — append to the IN (...) list. Each addition gives you (N − 1) new pair rows per minute, no extra wrappers.

  • Filter to actionable spreads — append HAVING spread_pct > 0.04 (or whatever your round-trip fee is). The alert-dispatcher in the article's repo collapses to a single WHERE.

  • Different market — the WHERE market = 'BTC-USDT' is the only market constraint. Add OR market IN (...) and a GROUP BY market to scan a basket simultaneously.

  • Higher-resolution timing — 1-minute candles are the smallest exposed via api.ohlcv(...). For sub-minute work, JOIN against api.trade directly with a tolerance window (WHERE abs(a.timestamp - b.timestamp) < INTERVAL 100 MILLISECOND).

Last updated