# Correlation

A pairwise **Pearson correlation matrix** of daily simple returns, across a basket of crypto majors, in one query.

## Query

```sql
WITH
    '2024-01-01' AS start_date,
    '2024-12-31' AS end_date,
    candles AS (
        SELECT toDate(start) AS day, market, toFloat64(close) AS close
        FROM api.ohlcv(candle_duration_in_minutes = 1440)
        WHERE exchange = 'binance'
          AND market IN ('BTC-USDT', 'ETH-USDT', 'SOL-USDT', 'ADA-USDT', 'DOGE-USDT')
          AND start BETWEEN start_date AND end_date
    ),
    market_returns AS (
        SELECT day, market,
               close / lagInFrame(toNullable(close), 1) OVER (PARTITION BY market ORDER BY day) - 1 AS ret
        FROM candles
    )
SELECT a.market AS market_a, b.market AS market_b,
       corr(a.ret, b.ret) AS correlation
FROM market_returns a
JOIN market_returns b ON a.day = b.day
WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL
GROUP BY market_a, market_b
ORDER BY market_a, market_b
```

The `JOIN market_returns a JOIN market_returns b ON a.day = b.day` plus `WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL` is exactly pandas' pairwise-complete behaviour — only days where both markets returned a valid number contribute to the correlation. ClickHouse's `corr()` is the standard Pearson coefficient `Σ(xᵢ − x̄)(yᵢ − ȳ) / √(Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²)`; the `n−1` / `n` factors cancel out, so sample-vs-population doesn't apply.

## Output (2024)

The query above returns the long-form `(market_a, market_b, correlation)` triple — 25 rows for the 5 × 5 matrix, including the symmetric duplicates and the diagonal `1.0`s. For a triangular view, append `AND a.market < b.market` to the final `WHERE` clause.

|          | ADA-USDT | BTC-USDT | DOGE-USDT | ETH-USDT | SOL-USDT |
| -------- | -------- | -------- | --------- | -------- | -------- |
| **ADA**  | 1.00     | 0.66     | 0.65      | 0.66     | 0.63     |
| **BTC**  | 0.66     | 1.00     | 0.78      | 0.79     | 0.74     |
| **DOGE** | 0.65     | 0.78     | 1.00      | 0.66     | 0.63     |
| **ETH**  | 0.66     | 0.79     | 0.66      | 1.00     | 0.69     |
| **SOL**  | 0.63     | 0.74     | 0.63      | 0.69     | 1.00     |

Top large-cap pairs (BTC↔ETH at 0.79, BTC↔DOGE at 0.78, BTC↔SOL at 0.74) confirm crypto's well-known **single-factor regime** — BTC direction explains most of the daily variance across the rest of the market. ADA is the loosest leader-follower at the bottom of the ranking (\~0.63–0.66 with the others).

## Pivot to a matrix in SQL

ClickHouse has no native `PIVOT` clause, but the **conditional-aggregate** idiom (`anyIf`, `maxIf`, `sumIf`, …) produces a wide matrix in one extra wrapping query — no client-side reshape needed:

```sql
WITH
    '2024-01-01' AS start_date,
    '2024-12-31' AS end_date,
    candles AS (
        SELECT toDate(start) AS day, market, toFloat64(close) AS close
        FROM api.ohlcv(candle_duration_in_minutes = 1440)
        WHERE exchange = 'binance'
          AND market IN ('BTC-USDT', 'ETH-USDT', 'SOL-USDT', 'ADA-USDT', 'DOGE-USDT')
          AND start BETWEEN start_date AND end_date
    ),
    market_returns AS (
        SELECT day, market,
               close / lagInFrame(toNullable(close), 1) OVER (PARTITION BY market ORDER BY day) - 1 AS ret
        FROM candles
    ),
    pairs AS (
        SELECT a.market AS market_a, b.market AS market_b,
               corr(a.ret, b.ret) AS correlation
        FROM market_returns a
        JOIN market_returns b ON a.day = b.day
        WHERE a.ret IS NOT NULL AND b.ret IS NOT NULL
        GROUP BY market_a, market_b
    )
SELECT market_a,
       mapFromArrays(groupArray(market_b), groupArray(correlation)) AS correlations
FROM pairs
GROUP BY market_a
ORDER BY market_a
```

Output is one row per `market_a` carrying a `Map(String, Float64)` of the other markets to their correlations — the client unfolds it into whatever shape it wants. Useful when the basket is parameterised at query time.

## Extending it

* **Different basket** — change the `IN (...)` list. Adding a market costs nothing in query length.
* **Rolling window** — replace `corr(a.ret, b.ret)` aggregate with a windowed `corr(...) OVER (... ROWS BETWEEN N PRECEDING AND CURRENT ROW)` to track how correlations change over time (e.g., regime detection).
* **Different return type** — `ln(close / lag) - 1` for log returns instead of simple returns. Pearson correlation of log returns vs simple returns is virtually identical for daily crypto, but the switch is one expression.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.koinju.io/compute-engine/correlation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
