Correlation

A pairwise Pearson correlation matrix of daily simple returns, across a basket of crypto majors, in one query.

Query

WITH
    '2024-01-01' AS start_date,
    '2024-12-31' AS end_date,
    candles AS (
        SELECT toDate(start) AS day, market, toFloat64(close) AS close
        FROM api.ohlcv(candle_duration_in_minutes = 1440)
        WHERE exchange = 'binance'
          AND market IN ('BTC-USDT', 'ETH-USDT', 'SOL-USDT', 'ADA-USDT', 'DOGE-USDT')
          AND start BETWEEN start_date AND end_date
    ),
    market_returns AS (
        SELECT day, market,
               close / lagInFrame(toNullable(close), 1) OVER (PARTITION BY market ORDER BY day) - 1 AS ret
        FROM candles
    )
SELECT a.market AS market_a, b.market AS market_b,
       corr(a.ret, b.ret) AS correlation
FROM market_returns a
JOIN market_returns b ON a.day = b.day
WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL
GROUP BY market_a, market_b
ORDER BY market_a, market_b

The JOIN market_returns a JOIN market_returns b ON a.day = b.day plus WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL is exactly pandas' pairwise-complete behaviour — only days where both markets returned a valid number contribute to the correlation. ClickHouse's corr() is the standard Pearson coefficient Σ(xᵢ − x̄)(yᵢ − ȳ) / √(Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²); the n−1 / n factors cancel out, so sample-vs-population doesn't apply.

Output (2024)

The query above returns the long-form (market_a, market_b, correlation) triple — 25 rows for the 5 × 5 matrix, including the symmetric duplicates and the diagonal 1.0s. For a triangular view, append AND a.market < b.market to the final WHERE clause.

ADA-USDT
BTC-USDT
DOGE-USDT
ETH-USDT
SOL-USDT

ADA

1.00

0.66

0.65

0.66

0.63

BTC

0.66

1.00

0.78

0.79

0.74

DOGE

0.65

0.78

1.00

0.66

0.63

ETH

0.66

0.79

0.66

1.00

0.69

SOL

0.63

0.74

0.63

0.69

1.00

Top large-cap pairs (BTC↔ETH at 0.79, BTC↔DOGE at 0.78, BTC↔SOL at 0.74) confirm crypto's well-known single-factor regime — BTC direction explains most of the daily variance across the rest of the market. ADA is the loosest leader-follower at the bottom of the ranking (~0.63–0.66 with the others).

Pivot to a matrix in SQL

ClickHouse has no native PIVOT clause, but the conditional-aggregate idiom (anyIf, maxIf, sumIf, …) produces a wide matrix in one extra wrapping query — no client-side reshape needed:

Output is one row per market_a carrying a Map(String, Float64) of the other markets to their correlations — the client unfolds it into whatever shape it wants. Useful when the basket is parameterised at query time.

Extending it

  • Different basket — change the IN (...) list. Adding a market costs nothing in query length.

  • Rolling window — replace corr(a.ret, b.ret) aggregate with a windowed corr(...) OVER (... ROWS BETWEEN N PRECEDING AND CURRENT ROW) to track how correlations change over time (e.g., regime detection).

  • Different return typeln(close / lag) - 1 for log returns instead of simple returns. Pearson correlation of log returns vs simple returns is virtually identical for daily crypto, but the switch is one expression.

Last updated