> For the complete documentation index, see [llms.txt](https://docs.koinju.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.koinju.io/sql-cookbook/cross-exchange-arbitrage.md).

# Cross-exchange arbitrage

Per-minute **cross-exchange spread matrix** for one market across many venues — the canonical "find the price gap" demo.

## Query

```sql
WITH
    '2024-12-31' AS day,
    p AS (
        SELECT start, exchange, toFloat64(close) AS close
        FROM api.ohlcv(candle_duration_in_minutes = 1)
        WHERE market = 'BTC-USDT'
          AND exchange IN ('binance', 'okx', 'kucoin', 'gateio')
          AND start >= toDateTime(day)
          AND start <  toDateTime(day) + INTERVAL 1 DAY
    )
SELECT a.start,
       a.exchange AS buy_ex,
       b.exchange AS sell_ex,
       a.close AS buy_price,
       b.close AS sell_price,
       (b.close - a.close) / a.close * 100 AS spread_pct
FROM p a
JOIN p b ON a.start = b.start
WHERE a.exchange < b.exchange
ORDER BY a.start, buy_ex, sell_ex
```

Functions used: [`toFloat64`](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#tofloat64), [`toDateTime`](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#todatetime)

Two SQL details worth flagging:

* **`a.exchange < b.exchange`** is the standard ordered-pair trick to avoid double-counting `(binance, okx)` and `(okx, binance)`. With 4 exchanges you get C(4, 2) = 6 ordered pairs, not 16 of the unordered-and-self-joined Cartesian product.
* **No threaded fetcher.** `api.ohlcv(...)` returns simultaneous cross-exchange snapshots by virtue of the WHERE clause — every row for `start = T` is for the same `T`. The repo's threading exists only because each REST call is independent and synchronous.

## Output (2024-12-31)

8,640 rows = 1,440 minutes × 6 ordered pairs. Top 8 absolute spreads of the day:

| start               | buy\_ex | sell\_ex | buy\_price | sell\_price | spread\_pct |
| ------------------- | ------- | -------- | ---------- | ----------- | ----------- |
| 2024-12-31 16:54:00 | binance | kucoin   | 94903.65   | 95000.00    | +0.102      |
| 2024-12-31 16:54:00 | gateio  | kucoin   | 94904.20   | 95000.00    | +0.101      |
| 2024-12-31 13:40:00 | kucoin  | okx      | 95561.60   | 95656.00    | +0.099      |
| 2024-12-31 16:54:00 | kucoin  | okx      | 95000.00   | 94906.60    | −0.098      |
| 2024-12-31 13:36:00 | kucoin  | okx      | 95600.00   | 95693.60    | +0.098      |
| 2024-12-31 13:35:00 | kucoin  | okx      | 95600.00   | 95691.80    | +0.096      |
| 2024-12-31 13:36:00 | binance | kucoin   | 95688.98   | 95600.00    | −0.093      |
| 2024-12-31 13:50:00 | binance | kucoin   | 95891.73   | 95804.20    | −0.091      |

A few clusters of \~0.1 % gaps — well above retail trading-fee breakeven (\~0.04 % each side), but small enough that the venues re-arbed within a minute or two. The mean / std distribution per pair tells the same story:

| pair             | mean   | std   | min    | max    |
| ---------------- | ------ | ----- | ------ | ------ |
| binance ↔ gateio | −0.010 | 0.010 | −0.043 | +0.047 |
| binance ↔ kucoin | −0.003 | 0.012 | −0.093 | +0.102 |
| binance ↔ okx    | 0.000  | 0.008 | −0.025 | +0.033 |
| gateio ↔ kucoin  | +0.007 | 0.013 | −0.085 | +0.101 |
| gateio ↔ okx     | +0.010 | 0.011 | −0.036 | +0.057 |
| kucoin ↔ okx     | +0.003 | 0.012 | −0.098 | +0.099 |

`kucoin` is the noisiest leg — its standard deviation against every other venue is the largest, and it's the buy/sell side of every top-8 spread row.

## Extending it

* **More exchanges** — append to the `IN (...)` list. Each addition gives you `(N − 1)` new pair rows per minute, no extra wrappers.
* **Filter to actionable spreads** — append `HAVING spread_pct > 0.04` (or whatever your round-trip fee is). The alert-dispatcher in the article's repo collapses to a single `WHERE`.
* **Different market** — the `WHERE market = 'BTC-USDT'` is the only market constraint. Add `OR market IN (...)` and a `GROUP BY market` to scan a basket simultaneously.
* **Higher-resolution timing** — 1-minute candles are the smallest exposed via `api.ohlcv(...)`. For sub-minute work, JOIN against `api.trade` directly with a tolerance window (`WHERE abs(a.timestamp - b.timestamp) < INTERVAL 100 MILLISECOND`).
