This repository implements an ensemble strategy for backtesting stock price, Combining Bollinger Bands and LSTM (Neural Network) Models.
- ๐ ๏ธ How it works
- ๐ User Manual
- ๏ธ ๐๏ธ Project Structure
- ๏ธ ๐ ๏ธ Our Approach
- ๐ Performance Analysis
- ๐ฎ Drawbacks & Future Works
- ๐ Asset Categories
There are two important files in the repository ๐.
- The
run.ipynb
is the main file that the user can run ๐โโ๏ธ to backtest the ensemble strategy. - The
generate_signals.py
file is a module that generates buy/sell/hold signals ๐๐๐ as 1,-1,0 respectively and returns them to the notebook. The user can customize the input parameters. - The notebook will generate a
quantstats
full report ๐ at the end to evaluate the performance of the strategy.
Follow the steps for a manual installation: (You can run the project only with the 3 steps (steps 1,2,4) )
-
Clone the Repository: Get a copy of this repository on your local machine with the following command:
git clone https://github.com/YeakubSadlil/Ensemble_backtesting_stock_market.git
-
Install Dependencies: Make sure you have
Python 3.10
installed. Then, install the required dependencies using the following command:
pip install -r requirements.txt
-
Data Ingestion: Ingest your custom data with Zipline. There is a default data ingestion available on the notebook. Note that the strategy can accept multiple assets.
-
Run the Notebook: Open the
run.ipynb
file in the repository and customize your backtesting parameters such as stock symbols, time period, and investment settings like the amount and number of stocks to buy at each buy signal. -
Interpret Results: A
quantstats
report will be generated automatically at the end bygs.plots(results)
.
Analyze the generated plots and results to assess the strategy's performance on your selected or default assets.
N.B.
- If you don't want to customize any input parameters or data ingestion, you can directly run the notebook
run.ipynb
without any changes. - If you face any issues in step 2 associated with
ta-lib
, please install it first, doc
If you have Docker installed, you can use it to run the project to avoid setting the environment or installing dependencies:
-
Verify Docker Installation: Make sure Docker is installed and running on your machine.
-
Clone the Repository: Get a copy of this repository on your local machine with the following command:
git clone https://github.com/YeakubSadlil/Ensemble_backtesting_stock_market.git
-
Build the Docker Image: Run the following command to build the Docker image:
docker build -t ensemble-backtest-stockprice .
-
Run the Docker Container: Start a new Docker container with the image using the following command:
docker run -d -p 8888:8888 --name my_backtest_container ensemble-backtest-stockprice
-
Access Jupyter Notebook: Open your web browser and go to
http://localhost:8888
. Paste the copied token when prompted. (If no token is required, you can skip step 6). -
Get the Jupyter Notebook Token: Run the following commands to get the Jupyter Notebook token:
docker exec -it my_backtest_container /bin/bash
After that, run the following command inside the container that you run:
jupyter server list
It will show you a link with a token. Copy the token only from the link and paste it in the browser jupyter notebook prompt.
From the Jupyter Notebook, run run.ipynb
to start the project.
- If you don't want to install anything on your local machine or you haven't have enough time to set up the environment, you can run the project on Google Colab.
- Please go to the Colab Notebook and follow the instructions there. After uploading the necessary files you are ready to go just with a single click 'Run All'.
โโโ ๐ Data <- Folder for all the data used for model training
โย ย โโโ sp50/daily
โย ย
โโโ ๐ ML Models <- Folder for all the machine learning models used for the project
โ โโโLSTM_Stock_Price_Prediction.ipynb
โ
โโโ ๐ run.ipynb <- Jupyter notebook from which the user can run the backtesting
โ
โโโ ๐ generate_signal.py <- Module to generate the buy/sell/hold signals
โ
โโโ ๐ requirements.txt <- List of required python packages
โ
โโโ ๐ณ Dockerfile <- Dockerfile for building the Docker image
โ
โโโ ๐ lstm_12_p50_ckp_13_24_e150.h5 <- LSTM model weights file
- Ensemble Strategy: We combined Bollinger Bands and LSTM models to predict stock prices and generate signals.
- When the LSTM model predicts that tomorrow's stock price is higher than the current price and the Bollinger's lower band is also higher than the current price, then we generate a buy signal (Long Position)
- When the LSTM model predicts that tomorrow's stock price is lower than the current price and Bollinger's upper band is also lower than the current price, then we generate a sell signal.
- Otherwise, we generate a hold signal.
- We have chosen to go only for long positions as the market is a bull market.
- Tuning Models:
- We have tried trend following and mean reversion strategies with different technical indicators like
MACD, RSI, Bollinger Bands
, etc. and checked their individual performance. - Then we combined the best performing strategies with
LSTM
to create an ensemble strategy. - We have found that the ensemble strategy is performing better than the individual strategies compared to each other and benchmark S&P500.
- We have also tested the In Sample and Out Sample performance and found that the ensemble is performing better. Check all test notebooks in the
ML Models
folder.
- We have tried trend following and mean reversion strategies with different technical indicators like
- Bollinger Bands: Utilized the Bollinger Bands model to generate buy/sell signals based on the stock's price volatility with a default window of 20 days.
- LSTM Model: Developed an LSTM model to predict the stock price of the next day based on the previous 50 days of stock prices.
- Used it as a filter with Bollinger Bands to generate signals. The reason behind that is predicted stock price was higher than the current price during downtrends and lower during uptrends.
- Trained the model on S&P 500 data from 2013 to 2020 and tested it on data from 2023 to 2024. ๐
- Asset Categorization: Backtested our strategy on 50 assets from 10 different sectors (2018-2022) to add diversification and evaluate its performance. Check Asset Lists or the Data section to see the list of assets.
- Module Development: Developed a module to generate signals (
generate_signals.py
), which is imported into therun.ipynb
. It will return buy/sell/hold signals as 1, -1, 0 respectively. - Backtesting: Utilized the
zipline
library to backtest our strategy andquantstats
to evaluate the performance. ๐งช
LSTM Model Architecture:
Our ensemble strategy is pretty close to the Bollinger Bands individual strategy, but it has outperformed the benchmark (S&P 500) in terms of CAGR, Sharpe Ratio, Portfolio Value while bactested with 50 assets from 2018-22.
- It couldn't beat the benchmark while backtested with some single assets for out sample data but performed well for the
AAPL
stock. - Although the performance was better than Benchmark when going long only in a bull condition, the strategy was suffering when there were high drawdowns which indicates that the strategy is not robust enough to handle the market downturns.
Ensemble Notebook: We tuned our ensemble model in Google Colab for faster training. The notebook is available here or check the Test_ensemble_InSample.ipynb
in the folder ML Models
.
The table below shows the performance comparison on in-sample data
Metric | Benchmark | Bollinger Bands | LSTM + Bollinger Ensemble |
---|---|---|---|
Start Period | 2018-03-19 | 2018-03-19 | 2018-03-19 |
End Period | 2022-12-30 | 2022-12-30 | 2022-12-30 |
Risk-Free Rate | 0.0% | 0.0% | 0.0% |
Time in Market | 100.0% | 100.0% | 100.0% |
Cumulative Return | 39.52% | 50.92% | 54.21% |
CAGR | 4.92% | 6.12% | 6.45% |
Sharpe | 0.43 | 0.46 | 0.49 |
Max Drawdown | -33.92% | -43.2% | -44.06% |
Avg. Drawdown | -2.18% | -2.79% | -2.79% |
Volatility (ann.) | 22.01% | 25.56% | 25.21% |
Calmar | 0.15 | 0.14 | 0.15 |
The plot below shows the performance of the ensemble strategy
- Dataset Choosing: We have trained the LSTM model on S&P 500 data, but a market index can be created with the 50 assets we have used for backtesting.
- Order Strategy: As the market is a bull market we went only for long positions but with a proper short-selling strategy more profit can be generated.
- Fine-Tuning Models: Continuously refine and optimize the Bollinger Bands window size and LSTM models for better prediction accuracy. The LSTM model was underperforming while predicting based on the past 100 and 150 days.LSTM may suffer from vanishing gradients and can be improved with
Attention mechanisms
,Stacking more layers
orBidirectional LSTMs
etc.๐ง - Risk Management: Implement risk management strategies to minimize potential losses such as stop loss and take profit.
- Meta-Labeling Strategy: In his book Advances in Financial Machine Learning, Dr.Lopez de Prad describes a Meta-labeling technique that uses an array of new Ensemble learning techniques to enhance machine learning strategies. Hudson & Thames, a financial research group, expanded on these techniques and showed some implementation ideas in a youtube video.
We have backtested our strategy on 50 assets from 10 different sectors. If you want to test our model based on your custom data please choose tickers from here. The list of assets is as follows:
Industrials | Health Care | Information Technology | Financials | Materials | Consumer Staples | Energy | Communication Services | Utilities | Real Estate |
---|---|---|---|---|---|---|---|---|---|
MMM | ABT | ADBE | AFL | FMC | BG | TRGP | DIS | AES | ARE |
AOS | BAX | AMD | BAC | IFF | MO | VLO | WBD | LNT | BXP |
BA | BDX | AAPL | BRK-B | KLAC | CPB | WMB | AEP | CPT | |
AXON | TECH | CDNS | BX | APD | STZ | APA | FOX | AWK | AMT |
CAT | ALGN | NVDA | COF | CE | WMT | BKR | EA | CEG | CCI |
Data dances in time's rapid stream ๐๐บ๐โณ
Patterns prediction, a trader's dream ๐ฎ๐ฐ๐ค๐ด
Bollinger's Bands, our measuring guide ๐๐๐๐
LSTM whispers where profits reside ๐คซ๐ฐ๐ต๐
The ensemble dances with a symphony bright ๐๐ญ๐๐ถ
Forecasting markets, with endless sight ๐ง๐๐๐
--------------------> An Anonymous Quant