> For the complete documentation index, see [llms.txt](https://docs.koinju.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.koinju.io/sql-cookbook/correlation.md).

# Correlation

A pairwise **Pearson correlation matrix** of daily simple returns, across a basket of crypto majors, in one query.

## Query

```sql
WITH
    '2024-01-01' AS start_date,
    '2024-12-31' AS end_date,
    candles AS (
        SELECT toDate(start) AS day, market, toFloat64(close) AS close
        FROM api.ohlcv(candle_duration_in_minutes = 1440)
        WHERE exchange = 'binance'
          AND market IN ('BTC-USDT', 'ETH-USDT', 'SOL-USDT', 'ADA-USDT', 'DOGE-USDT')
          AND start BETWEEN start_date AND end_date
    ),
    market_returns AS (
        SELECT day, market,
               close / lagInFrame(toNullable(close), 1) OVER (PARTITION BY market ORDER BY day) - 1 AS ret
        FROM candles
    )
SELECT a.market AS market_a, b.market AS market_b,
       corr(a.ret, b.ret) AS correlation
FROM market_returns a
JOIN market_returns b ON a.day = b.day
WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL
GROUP BY market_a, market_b
ORDER BY market_a, market_b
```

Functions used: [`toDate`](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#todate), [`toFloat64`](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#tofloat64), [`lagInFrame`](https://clickhouse.com/docs/sql-reference/window-functions#laginframe), [`toNullable`](https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions#tonullable)

The `JOIN market_returns a JOIN market_returns b ON a.day = b.day` plus `WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL` is exactly pandas' pairwise-complete behaviour — only days where both markets returned a valid number contribute to the correlation. ClickHouse's [`corr()`](https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/corr) is the standard Pearson coefficient `Σ(xᵢ − x̄)(yᵢ − ȳ) / √(Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²)`; the `n−1` / `n` factors cancel out, so sample-vs-population doesn't apply.

## Output (2024)

The query above returns the long-form `(market_a, market_b, correlation)` triple — 25 rows for the 5 × 5 matrix, including the symmetric duplicates and the diagonal `1.0`s. For a triangular view, append `AND a.market < b.market` to the final `WHERE` clause.

|          | ADA-USDT | BTC-USDT | DOGE-USDT | ETH-USDT | SOL-USDT |
| -------- | -------- | -------- | --------- | -------- | -------- |
| **ADA**  | 1.00     | 0.66     | 0.65      | 0.66     | 0.63     |
| **BTC**  | 0.66     | 1.00     | 0.78      | 0.79     | 0.74     |
| **DOGE** | 0.65     | 0.78     | 1.00      | 0.66     | 0.63     |
| **ETH**  | 0.66     | 0.79     | 0.66      | 1.00     | 0.69     |
| **SOL**  | 0.63     | 0.74     | 0.63      | 0.69     | 1.00     |

Top large-cap pairs (BTC↔ETH at 0.79, BTC↔DOGE at 0.78, BTC↔SOL at 0.74) confirm crypto's well-known **single-factor regime** — BTC direction explains most of the daily variance across the rest of the market. ADA is the loosest leader-follower at the bottom of the ranking (\~0.63–0.66 with the others).

## Pivot to a matrix in SQL

ClickHouse has no native `PIVOT` clause, but the **conditional-aggregate** idiom (`anyIf`, `maxIf`, `sumIf`, …) produces a wide matrix in one extra wrapping query — no client-side reshape needed:

```sql
WITH
    '2024-01-01' AS start_date,
    '2024-12-31' AS end_date,
    candles AS (
        SELECT toDate(start) AS day, market, toFloat64(close) AS close
        FROM api.ohlcv(candle_duration_in_minutes = 1440)
        WHERE exchange = 'binance'
          AND market IN ('BTC-USDT', 'ETH-USDT', 'SOL-USDT', 'ADA-USDT', 'DOGE-USDT')
          AND start BETWEEN start_date AND end_date
    ),
    market_returns AS (
        SELECT day, market,
               close / lagInFrame(toNullable(close), 1) OVER (PARTITION BY market ORDER BY day) - 1 AS ret
        FROM candles
    ),
    pairs AS (
        SELECT a.market AS market_a, b.market AS market_b,
               corr(a.ret, b.ret) AS correlation
        FROM market_returns a
        JOIN market_returns b ON a.day = b.day
        WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL
        GROUP BY market_a, market_b
    )
SELECT market_a,
       mapFromArrays(groupArray(market_b), groupArray(correlation)) AS correlations
FROM pairs
GROUP BY market_a
ORDER BY market_a
```

Functions used: [`mapFromArrays`](https://clickhouse.com/docs/sql-reference/functions/tuple-map-functions#mapfromarrays), [`groupArray`](https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/grouparray)

Output is one row per `market_a` carrying a `Map(String, Float64)` of the other markets to their correlations — the client unfolds it into whatever shape it wants. Useful when the basket is parameterised at query time.

## Extending it

* **Different basket** — change the `IN (...)` list. Adding a market costs nothing in query length.
* **Rolling window** — replace `corr(a.ret, b.ret)` aggregate with a windowed `corr(...) OVER (... ROWS BETWEEN N PRECEDING AND CURRENT ROW)` to track how correlations change over time (e.g., regime detection).
* **Different return type** — `ln(close / lag) - 1` for log returns instead of simple returns. Pearson correlation of log returns vs simple returns is virtually identical for daily crypto, but the switch is one expression.
