# Historical volatility

This page shows how to compute the rolling 30-day **realised volatility** of a crypto pair entirely server-side, using two of the most common estimators:

* **Close-to-close** — the textbook approach: standard deviation of daily log returns, annualised by `√365`.
* **Parkinson** — uses the daily range (high/low) instead of just closes. Lower variance estimator because it sees more of the intraday move.

The full pipeline runs in a single SQL query against `api.ohlcv(...)`. We walk through it CTE-by-CTE then provide the full code at the end.

## Select the data

We pull daily candles for `BTC-USDT` on `binance` for the last 13 months — one extra month so that the 30-day window has enough lead-in.

```sql
with now() - interval 13 month as start_date, toStartOfDay(now()) as end_date,
    candles as (SELECT toDate(start) AS day, open, high, low, close
                FROM api.ohlcv(candle_duration_in_minutes = 1440)
                WHERE exchange = 'binance'
                  AND market = 'BTC-USDT'
                  AND start >= start_date
                  AND start <= end_date)
```

`candles` contains:

| day        | open     | high     | low      | close    |
| ---------- | -------- | -------- | -------- | -------- |
| 2024-01-01 | 42283.58 | 44184.10 | 42180.77 | 44179.55 |
| 2024-01-02 | 44179.55 | 45879.63 | 44148.34 | 44946.91 |
| 2024-01-03 | 44946.91 | 45500.00 | 40750.00 | 42845.23 |
| 2024-01-04 | 42845.23 | 44729.58 | 42613.77 | 44151.10 |
| ...        | ...      | ...      | ...      | ...      |

## Compute log returns and intraday range

Two per-row quantities feed the two estimators:

* `log_return = ln(close / close[t-1])` — close-to-close log return.
* `hl_log_sq = ln(high / low)²` — squared log range, the input to Parkinson.

We use `nullIf(..., 0)` on the lagged close so the very first row (no predecessor) yields `NULL` rather than tripping a division-by-zero on `Decimal(76, 20)`.

```sql
    --- previous code snippet ---
    returns AS (SELECT day, high, low, close,
                       ln(close / nullIf(lagInFrame(close, 1) OVER (ORDER BY day), 0)) AS log_return,
                       pow(ln(high / low), 2) AS hl_log_sq
                FROM candles),
```

`returns` contains:

| day        | close    | log\_return | hl\_log\_sq |
| ---------- | -------- | ----------- | ----------- |
| 2024-01-01 | 44179.55 | NULL        | 0.002153    |
| 2024-01-02 | 44946.91 | 0.017220    | 0.001480    |
| 2024-01-03 | 42845.23 | -0.047888   | 0.012156    |
| 2024-01-04 | 44151.10 | 0.030024    | 0.002348    |
| ...        | ...      | ...         | ...         |

## Roll the 30-day window

ClickHouse window functions compute both estimators in one pass:

* Close-to-close: `stddevSamp(log_return)` over the 30-row window, annualised by `√365` and reported in percent.
* Parkinson: `√( Σ ln(H/L)² / (4·N·ln 2) )` over the same window, also annualised.

We carry `count(log_return) OVER w` so we can drop early rows where the window isn't yet full (the first 30 days, before we have 30 valid log returns).

```sql
    --- previous code snippet ---
    vols AS (SELECT day,
                    stddevSamp(log_return) OVER w * sqrt(365) * 100 AS vol_30d_cc_pct,
                    sqrt(sum(hl_log_sq) OVER w / (4 * 30 * ln(2))) * sqrt(365) * 100 AS vol_30d_park_pct,
                    count(log_return) OVER w AS valid_returns
             FROM returns
             WINDOW w AS (ORDER BY day ROWS BETWEEN 29 PRECEDING AND CURRENT ROW))
SELECT day, vol_30d_cc_pct, vol_30d_park_pct
FROM vols
WHERE valid_returns = 30
ORDER BY day;
```

Output:

| day        | vol\_30d\_cc\_pct | vol\_30d\_park\_pct |
| ---------- | ----------------- | ------------------- |
| 2024-01-31 | 53.90             | 58.88               |
| 2024-02-01 | 53.70             | 58.73               |
| 2024-02-02 | 51.02             | 54.19               |
| ...        | ...               | ...                 |
| 2024-12-29 | 44.39             | 56.48               |
| 2024-12-30 | 44.37             | 56.94               |
| 2024-12-31 | 44.38             | 57.53               |

## Notes on the estimators

* `stddevSamp` (Bessel-corrected, `ddof=1`) matches the convention used by most published HV charts — including Deribit's reference implementation and `pandas.rolling.std()` with default arguments. Use `stddevPop` if you prefer the population estimator.
* `√365` annualisation is appropriate for crypto (24/7 trading). For traditional markets, swap in `√252`.
* The Parkinson constant `1 / (4·N·ln 2)` ≈ `0.3607 / N` — switch the window length only by changing the `30` in two places (the `ROWS` frame and the Parkinson normaliser).

## Full example

```sql
with now() - interval 13 month as start_date, toStartOfDay(now()) as end_date,
    candles as (SELECT toDate(start) AS day, open, high, low, close
                FROM api.ohlcv(candle_duration_in_minutes = 1440)
                WHERE exchange = 'binance'
                  AND market = 'BTC-USDT'
                  AND start >= start_date
                  AND start <= end_date),
    returns AS (SELECT day, high, low, close,
                       ln(close / nullIf(lagInFrame(close, 1) OVER (ORDER BY day), 0)) AS log_return,
                       pow(ln(high / low), 2) AS hl_log_sq
                FROM candles),
    vols AS (SELECT day,
                    stddevSamp(log_return) OVER w * sqrt(365) * 100 AS vol_30d_cc_pct,
                    sqrt(sum(hl_log_sq) OVER w / (4 * 30 * ln(2))) * sqrt(365) * 100 AS vol_30d_park_pct,
                    count(log_return) OVER w AS valid_returns
             FROM returns
             WINDOW w AS (ORDER BY day ROWS BETWEEN 29 PRECEDING AND CURRENT ROW))
SELECT day, vol_30d_cc_pct, vol_30d_park_pct
FROM vols
WHERE valid_returns = 30
ORDER BY day;
```

The whole pipeline runs in well under a second of compute time; round-trip latency dominates.

| window length | execution time |
| ------------- | -------------- |
| 1 year        | \~ 5–8 s       |
| 5 years       | \~ 7–10 s      |

The dataset is downloadable as-is by running the `candles` sub-query if you prefer to compute the volatilities client-side (see the companion [Python reference](https://gitlab.com/koinju/connector/exporter/-/blob/master/api/python_vs_sql/historical_volatility/original.py) for a reproducible parity baseline).

## See also

* [Backtesting a simple SMA strategy](/compute-engine/backtesting-simple-sma-strategy.md) — uses the same `api.ohlcv(...)` view and `lagInFrame` window pattern.
* Source article that inspired this example: [Historical volatility calculations (Deribit Insights)](https://insights.deribit.com/dev-hub/historical-volatility-calculations-python-code/).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.koinju.io/compute-engine/historical-volatility.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
