Skip to content

๐Ÿ“ˆ ๐—”๐—ป ๐—ฒ๐—ป๐˜€๐—ฒ๐—บ๐—ฏ๐—น๐—ฒ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐˜† ๐—ณ๐—ผ๐—ฟ ๐—ฏ๐—ฎ๐—ฐ๐—ธ๐˜๐—ฒ๐˜€๐˜๐—ถ๐—ป๐—ด ๐˜€๐˜๐—ผ๐—ฐ๐—ธ ๐—ฝ๐—ฟ๐—ถ๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฏ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐—•๐—ผ๐—น๐—น๐—ถ๐—ป๐—ด๐—ฒ๐—ฟ ๐—•๐—ฎ๐—ป๐—ฑ๐˜€ ๐—ฎ๐—ป๐—ฑ ๐—Ÿ๐—ฆ๐—ง๐—  ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€

Notifications You must be signed in to change notification settings

YeakubSadlil/Ensemble_backtesting_stock_market

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

64 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

๐Ÿ“ˆ Ensemble Strategy for Backtesting Stock Price

This repository implements an ensemble strategy for backtesting stock price, Combining Bollinger Bands and LSTM (Neural Network) Models.

๐Ÿ“š Table of Contents

How it Works

There are two important files in the repository ๐Ÿ“.

  • The run.ipynb is the main file that the user can run ๐Ÿƒโ€โ™‚๏ธ to backtest the ensemble strategy.
  • The generate_signals.py file is a module that generates buy/sell/hold signals ๐Ÿ‘๐Ÿ‘Ž๐Ÿ” as 1,-1,0 respectively and returns them to the notebook. The user can customize the input parameters.
  • The notebook will generate a quantstats full report ๐Ÿ“Š at the end to evaluate the performance of the strategy.

๐Ÿ“– User Manual

1. Manual Installation

Follow the steps for a manual installation: (You can run the project only with the 3 steps (steps 1,2,4) )

  1. Clone the Repository: Get a copy of this repository on your local machine with the following command:
    git clone https://github.com/YeakubSadlil/Ensemble_backtesting_stock_market.git

  2. Install Dependencies: Make sure you have Python 3.10 installed. Then, install the required dependencies using the following command:
    pip install -r requirements.txt

  3. Data Ingestion: Ingest your custom data with Zipline. There is a default data ingestion available on the notebook. Note that the strategy can accept multiple assets.

  4. Run the Notebook: Open the run.ipynb file in the repository and customize your backtesting parameters such as stock symbols, time period, and investment settings like the amount and number of stocks to buy at each buy signal.

  5. Interpret Results: A quantstats report will be generated automatically at the end by gs.plots(results).
    Analyze the generated plots and results to assess the strategy's performance on your selected or default assets.

N.B.

  • If you don't want to customize any input parameters or data ingestion, you can directly run the notebook run.ipynb without any changes.
  • If you face any issues in step 2 associated with ta-lib, please install it first, doc

2. Installation Using Docker

If you have Docker installed, you can use it to run the project to avoid setting the environment or installing dependencies:

  1. Verify Docker Installation: Make sure Docker is installed and running on your machine.

  2. Clone the Repository: Get a copy of this repository on your local machine with the following command:
    git clone https://github.com/YeakubSadlil/Ensemble_backtesting_stock_market.git

  3. Build the Docker Image: Run the following command to build the Docker image:
    docker build -t ensemble-backtest-stockprice .

  4. Run the Docker Container: Start a new Docker container with the image using the following command:
    docker run -d -p 8888:8888 --name my_backtest_container ensemble-backtest-stockprice

  5. Access Jupyter Notebook: Open your web browser and go to http://localhost:8888. Paste the copied token when prompted. (If no token is required, you can skip step 6).

  6. Get the Jupyter Notebook Token: Run the following commands to get the Jupyter Notebook token:
    docker exec -it my_backtest_container /bin/bash
    After that, run the following command inside the container that you run:
    jupyter server list
    It will show you a link with a token. Copy the token only from the link and paste it in the browser jupyter notebook prompt.

From the Jupyter Notebook, run run.ipynb to start the project.

3. Google Colab

  • If you don't want to install anything on your local machine or you haven't have enough time to set up the environment, you can run the project on Google Colab.
  • Please go to the Colab Notebook and follow the instructions there. After uploading the necessary files you are ready to go just with a single click 'Run All'.

๐Ÿ—๏ธ Project Structure

โ”œโ”€โ”€ ๐Ÿ“‚ Data                <- Folder for all the data used for model training
โ”‚ย ย  โ””โ”€โ”€ sp50/daily     
โ”‚ย ย  
โ”œโ”€โ”€ ๐Ÿ“‚ ML Models           <- Folder for all the machine learning models used for the project
โ”‚       โ””โ”€โ”€LSTM_Stock_Price_Prediction.ipynb
โ”‚
โ”œโ”€โ”€ ๐Ÿ““ run.ipynb           <- Jupyter notebook from which the user can run the backtesting
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ generate_signal.py  <- Module to generate the buy/sell/hold signals
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ requirements.txt    <- List of required python packages
โ”‚
โ”œโ”€โ”€ ๐Ÿณ Dockerfile          <- Dockerfile for building the Docker image
โ”‚
โ””โ”€โ”€ ๐Ÿ“„ lstm_12_p50_ckp_13_24_e150.h5   <- LSTM model weights file

๐Ÿ› ๏ธ Our Approach

  • Ensemble Strategy: We combined Bollinger Bands and LSTM models to predict stock prices and generate signals.
    • When the LSTM model predicts that tomorrow's stock price is higher than the current price and the Bollinger's lower band is also higher than the current price, then we generate a buy signal (Long Position)
    • When the LSTM model predicts that tomorrow's stock price is lower than the current price and Bollinger's upper band is also lower than the current price, then we generate a sell signal.
    • Otherwise, we generate a hold signal.
    • We have chosen to go only for long positions as the market is a bull market.
  • Tuning Models:
    • We have tried trend following and mean reversion strategies with different technical indicators like MACD, RSI, Bollinger Bands, etc. and checked their individual performance.
    • Then we combined the best performing strategies with LSTM to create an ensemble strategy.
    • We have found that the ensemble strategy is performing better than the individual strategies compared to each other and benchmark S&P500.
    • We have also tested the In Sample and Out Sample performance and found that the ensemble is performing better. Check all test notebooks in the ML Models folder.
  • Bollinger Bands: Utilized the Bollinger Bands model to generate buy/sell signals based on the stock's price volatility with a default window of 20 days.
  • LSTM Model: Developed an LSTM model to predict the stock price of the next day based on the previous 50 days of stock prices.
    • Used it as a filter with Bollinger Bands to generate signals. The reason behind that is predicted stock price was higher than the current price during downtrends and lower during uptrends.
    • Trained the model on S&P 500 data from 2013 to 2020 and tested it on data from 2023 to 2024. ๐Ÿ“…
  • Asset Categorization: Backtested our strategy on 50 assets from 10 different sectors (2018-2022) to add diversification and evaluate its performance. Check Asset Lists or the Data section to see the list of assets.
  • Module Development: Developed a module to generate signals (generate_signals.py), which is imported into the run.ipynb. It will return buy/sell/hold signals as 1, -1, 0 respectively.
  • Backtesting: Utilized the zipline library to backtest our strategy and quantstats to evaluate the performance. ๐Ÿงช

LSTM Model Architecture:

LSTM Model

๐Ÿš€ Performance Analysis

Our ensemble strategy is pretty close to the Bollinger Bands individual strategy, but it has outperformed the benchmark (S&P 500) in terms of CAGR, Sharpe Ratio, Portfolio Value while bactested with 50 assets from 2018-22.

  • It couldn't beat the benchmark while backtested with some single assets for out sample data but performed well for the AAPL stock.
  • Although the performance was better than Benchmark when going long only in a bull condition, the strategy was suffering when there were high drawdowns which indicates that the strategy is not robust enough to handle the market downturns.

Ensemble Notebook: We tuned our ensemble model in Google Colab for faster training. The notebook is available here or check the Test_ensemble_InSample.ipynb in the folder ML Models.

The table below shows the performance comparison on in-sample data

Metric Benchmark Bollinger Bands LSTM + Bollinger Ensemble
Start Period 2018-03-19 2018-03-19 2018-03-19
End Period 2022-12-30 2022-12-30 2022-12-30
Risk-Free Rate 0.0% 0.0% 0.0%
Time in Market 100.0% 100.0% 100.0%
Cumulative Return 39.52% 50.92% 54.21%
CAGR 4.92% 6.12% 6.45%
Sharpe 0.43 0.46 0.49
Max Drawdown -33.92% -43.2% -44.06%
Avg. Drawdown -2.18% -2.79% -2.79%
Volatility (ann.) 22.01% 25.56% 25.21%
Calmar 0.15 0.14 0.15

The plot below shows the performance of the ensemble strategy

๐Ÿ”ฎ Drawbacks & Future Works

  • Dataset Choosing: We have trained the LSTM model on S&P 500 data, but a market index can be created with the 50 assets we have used for backtesting.
  • Order Strategy: As the market is a bull market we went only for long positions but with a proper short-selling strategy more profit can be generated.
  • Fine-Tuning Models: Continuously refine and optimize the Bollinger Bands window size and LSTM models for better prediction accuracy. The LSTM model was underperforming while predicting based on the past 100 and 150 days.LSTM may suffer from vanishing gradients and can be improved with Attention mechanisms, Stacking more layers or Bidirectional LSTMs etc.๐Ÿ”ง
  • Risk Management: Implement risk management strategies to minimize potential losses such as stop loss and take profit.
  • Meta-Labeling Strategy: In his book Advances in Financial Machine Learning, Dr.Lopez de Prad describes a Meta-labeling technique that uses an array of new Ensemble learning techniques to enhance machine learning strategies. Hudson & Thames, a financial research group, expanded on these techniques and showed some implementation ideas in a youtube video.

๐Ÿ“‚ Asset Categories

We have backtested our strategy on 50 assets from 10 different sectors. If you want to test our model based on your custom data please choose tickers from here. The list of assets is as follows:

Industrials Health Care Information Technology Financials Materials Consumer Staples Energy Communication Services Utilities Real Estate
MMM ABT ADBE AFL FMC BG TRGP DIS AES ARE
AOS BAX AMD BAC IFF MO VLO WBD LNT BXP
BA BDX AAPL BRK-B KLAC CPB WMB GOOGLE AEP CPT
AXON TECH CDNS BX APD STZ APA FOX AWK AMT
CAT ALGN NVDA COF CE WMT BKR EA CEG CCI

Summary

                                      Data dances in time's rapid stream  ๐Ÿ’ƒ๐Ÿ•บ๐ŸŒŠโณ
                                      Patterns prediction, a trader's dream ๐Ÿ”ฎ๐Ÿ’ฐ๐Ÿ’ค๐Ÿ˜ด
                                      Bollinger's Bands, our measuring guide  ๐Ÿ“๐Ÿ“ˆ๐Ÿ“Š๐Ÿ”
                                      LSTM whispers where profits reside  ๐Ÿคซ๐Ÿ’ฐ๐Ÿ’ต๐Ÿ 
                                      The ensemble dances with a symphony bright ๐ŸŒŸ๐ŸŽญ๐Ÿ’ƒ๐ŸŽถ
                                      Forecasting markets, with endless sight  ๐Ÿง๐Ÿ“‰๐Ÿ“ˆ๐Ÿ‘€
                                      --------------------> An Anonymous Quant

About

๐Ÿ“ˆ ๐—”๐—ป ๐—ฒ๐—ป๐˜€๐—ฒ๐—บ๐—ฏ๐—น๐—ฒ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐˜† ๐—ณ๐—ผ๐—ฟ ๐—ฏ๐—ฎ๐—ฐ๐—ธ๐˜๐—ฒ๐˜€๐˜๐—ถ๐—ป๐—ด ๐˜€๐˜๐—ผ๐—ฐ๐—ธ ๐—ฝ๐—ฟ๐—ถ๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฏ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐—•๐—ผ๐—น๐—น๐—ถ๐—ป๐—ด๐—ฒ๐—ฟ ๐—•๐—ฎ๐—ป๐—ฑ๐˜€ ๐—ฎ๐—ป๐—ฑ ๐—Ÿ๐—ฆ๐—ง๐—  ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages