# Backtesting simple SMA strategy

This doc will explain how to run a backtest for a simple SMA (*simple moving average*) crossing strategy between 50 and 200 period SMA's directly inside the Clickhouse instance.\
For simplicity, we assume that there is no transaction cost, slippage or financing cost thus the returns obtained are unrealistic.

We will present it step by step, then we will provide the full code at the end.

## Select the data

We are selecting the 1 minutes candles for `BTC-USD` from `coinbase` for the last 12 months.

```sql
with now() - interval 12 month as start_date, toStartOfMinute(now()) as end_date, 10000 as initial_cash,
    -- Fetching BTC data for the specified period
    btc_data as (SELECT timestamp,
                        argMaxMerge(close) as close
                 FROM public_data.candle_1m
                 WHERE market = 'BTC-USD'
                   AND exchange = 'coinbase'
                   and timestamp > start_date
                   and timestamp < end_date
                 GROUP BY timestamp)
```

`btc_data` contains:

| timestamp         | close    |
| ----------------- | -------- |
| 2025-03-07T08:50Z | 88409.84 |
| 2024-07-07T09:24Z | 57631.88 |
| 2024-10-29T23:04Z | 72649.13 |
| ...               | ...      |

## Compute the SMA

The 50 and 200 period SMA is computed using a window function.

Clickhouse offers a wide range of statistical functions documented [here](https://clickhouse.com/docs/sql-reference/functions).

```sql
--- previous code snippet ---
    sma AS (SELECT timestamp,
                   close,
                   avg(close) OVER w50  AS sma50,
                   avg(close) OVER w200 AS sma200
            FROM btc_data
            -- using -1 to avoid the current row in the moving average calculation and thus avoid lookahead bias
            WINDOW w50 AS (ORDER BY timestamp ROWS BETWEEN 50 PRECEDING AND 1 PRECEDING ),
                   w200 AS (ORDER BY timestamp ROWS BETWEEN 200 PRECEDING AND 1 PRECEDING )),
```

`sma` contains:

| timestamp         | close    | sma50             | sma200            |
| ----------------- | -------- | ----------------- | ----------------- |
| 2024-06-11T15:23Z | 66908.00 | 66918.2           | 66918.2           |
| 2024-06-11T15:24Z | 66870.67 | 66913.1           | 66913.1           |
| 2024-06-11T15:25Z | 66835.35 | 66898.95666666667 | 66898.95666666667 |
| 2024-06-11T15:26Z | 66757.10 | 66883.055         | 66883.055         |
| 2024-06-11T15:27Z | 66779.25 | 66857.864         | 66857.864         |
| 2024-06-11T15:28Z | 66795.82 | 66844.76166666666 | 66844.76166666666 |
| 2024-06-11T15:29Z | 66817.05 | 66837.76999999999 | 66837.76999999999 |
| 2024-06-11T15:30Z | 66776.63 | 66835.18000000001 | 66835.18000000001 |
| 2024-06-11T15:31Z | 66780.80 | 66828.67444444445 | 66828.67444444445 |
| ...               | ...      | ...               | ...               |

## Generate the signals and extract the trades

We detect the crossing of the two SMA's and generate the signals accordingly.

```sql
    --- previous code snippet ---
    signals AS (SELECT timestamp,
                          close,
                          sma50,
                          sma200,
                          CASE
                              WHEN sma50 > sma200 AND lagInFrame(sma50) OVER w < lagInFrame(sma200) OVER w THEN 'long' -- golden cross
                              WHEN sma50 < sma200 AND lagInFrame(sma50) OVER w > lagInFrame(sma200) OVER w THEN 'short' -- death cross
                              ELSE NULL
                              END AS signal
                   FROM sma
                   WINDOW w AS (ORDER BY timestamp)),
    trades AS (SELECT timestamp,
                      close                                  AS price,
                      signal,
                      row_number() OVER (ORDER BY timestamp) AS trade_id
               FROM signals
               WHERE signal is not null)
```

`trades` contains:

| timestamp         | price    | signal | trade\_id |
| ----------------- | -------- | ------ | --------- |
| 2024-06-11T17:29Z | 66258.98 | short  | 1         |
| 2024-06-11T18:03Z | 66784.28 | long   | 2         |
| 2024-06-12T00:10Z | 67354.27 | short  | 3         |
| ...               | ...      | ...    | ...       |

## Compute strategy performance

We regroup the trades by their signal and ids to present the data with entry and exit prices for each trade for the PnL calculation.

```sql
    --- previous code snippet ---
    returns AS (SELECT entry.trade_id,
                           entry.timestamp          AS entry_date,
                           exit.timestamp           AS exit_date,
                           entry.price              AS entry_price,
                           exit.price               AS exit_price,
                           entry.signal             as signal,
                           if(signal = 'long', exit_price / entry_price, entry_price / exit_price) - 1  as trade_return
                    FROM trades entry
                             LEFT JOIN trades exit ON entry.trade_id + 1 = exit.trade_id
                    where exit_price != 0 -- ignoring the last unclosed trade
                    order by trade_id)
```

`returns` contains:

| trade\_id | entry\_date       | exit\_date        | entry\_price | exit\_price | signal | trade\_return           |
| --------- | ----------------- | ----------------- | ------------ | ----------- | ------ | ----------------------- |
| 1         | 2024-06-11T17:22Z | 2024-06-11T18:11Z | 66289.25     | 66695.99    | short  | -0.00609841761101379559 |
| 2         | 2024-06-11T18:11Z | 2024-06-12T00:10Z | 66695.99     | 67354.27    | long   | 0.00986985874263205329  |
| 3         | 2024-06-12T00:10Z | 2024-06-12T02:32Z | 67354.27     | 67455.74    | short  | -0.00150424559866958691 |
| 4         | 2024-06-12T02:32Z | 2024-06-12T05:33Z | 67455.74     | 67362.65    | long   | -0.00138001599270870055 |
| 5         | 2024-06-12T05:33Z | 2024-06-12T07:57Z | 67362.65     | 67455.94    | short  | -0.00138297679937452507 |
| 6         | 2024-06-12T07:57Z | 2024-06-12T16:33Z | 67455.94     | 69424.35    | long   | 0.02918067704638020017  |
| 7         | 2024-06-12T16:33Z | 2024-06-12T21:51Z | 69424.35     | 68534.74    | short  | 0.01298042423448312490  |
| 8         | 2024-06-12T21:51Z | 2024-06-12T23:30Z | 68534.74     | 68173.60    | long   | -0.00526944437229936234 |
| ...       | ...               | ...               | ...          | ...         | ...    | ...                     |

### Sharpe Ratio

We compute the Sharpe ratio of the strategy using the returns from the trades.

```sql
    --- previous code snippet ---
    SELECT  avg(trade_return) / stddevPop(trade_return)   AS sharpe_ratio from returns
```

outputs `0.022227898711626174`

The above results can be also grouped by day and the Sharpe ratio can then be annualised to get a better understanding of the strategy performance over time.

```sql
    --- previous code snippet ---
    daily_retuns AS (
    SELECT  sum( trade_return) AS daily_return
    from returns
    group by toStartOfDay(exit_date)
    )
    SELECT  sqrt(365) *  avg(daily_return) / stddevPop(daily_return)   AS sharpe_ratio from daily_retuns;
```

outputs `1.3117539455990057`

## Full example

```sql
with now() - interval 12 month as start_date, toStartOfMinute(now()) as end_date,
    -- Fetching BTC data for the specified period
    btc_data as (SELECT timestamp,
                        argMaxMerge(close) as close
                 FROM public_data.candle_1m
                 WHERE market = 'BTC-USD'
                   AND exchange = 'coinbase'
                   and timestamp > start_date
                   and timestamp < end_date
                 GROUP BY timestamp),
    -- Calculating 50 and 200 period simple moving averages
    sma AS (SELECT timestamp,
                   close,
                   avg(close) OVER w50  AS sma50,
                   avg(close) OVER w200 AS sma200
            FROM btc_data
            -- using -1 to avoid the current row in the moving average calculation and thus avoid lookahead bias
            WINDOW w50 AS (ORDER BY timestamp ROWS BETWEEN 50 PRECEDING AND 1 PRECEDING ),
                   w200 AS (ORDER BY timestamp ROWS BETWEEN 200 PRECEDING AND 1 PRECEDING )),
    signals AS (SELECT timestamp,
                       close,
                       sma50,
                       sma200,
                       CASE
                           WHEN sma50 > sma200 AND lagInFrame(sma50) OVER w < lagInFrame(sma200) OVER w
                               THEN 'long' -- golden cross
                           WHEN sma50 < sma200 AND lagInFrame(sma50) OVER w > lagInFrame(sma200) OVER w
                               THEN 'short' -- death cross
                           END AS signal
                FROM sma
                WINDOW w AS (ORDER BY timestamp)),
    trades AS (SELECT timestamp,
                      close                                  AS price,
                      signal,
                      row_number() OVER (ORDER BY timestamp) AS trade_id
               FROM signals
               WHERE signal is not null),
    returns AS (SELECT entry.trade_id,
                       entry.timestamp                                                             AS entry_date,
                       exit.timestamp                                                              AS exit_date,
                       entry.price                                                                 AS entry_price,
                       exit.price                                                                  AS exit_price,
                       entry.signal                                                                as signal,
                       if(signal = 'long', exit_price / entry_price, entry_price / exit_price) - 1 as trade_return
                FROM trades entry
                         LEFT JOIN trades exit ON entry.trade_id + 1 = exit.trade_id
                where exit_price != 0 -- ignoring the last unclosed trade
                order by trade_id),
    daily_retuns AS (SELECT sum(trade_return) AS daily_return
                     from returns
                     group by toStartOfDay(exit_date))
SELECT sqrt(365) * avg(daily_return) / stddevPop(daily_return) AS sharpe_ratio
from daily_retuns;
```

The execution speed of this backtest is rapid, even for large datasets.

| number of candles     | Execution time |
| --------------------- | -------------- |
| 525,525 ( 1 Year)     | 1 s 678 ms     |
| 2,628,415 ( 5 Years ) | 7 s 315 ms     |

The dataset can always be downloaded from the table by running the `btc_data` sub-query.
