V1.5.0
Here is where you should have the description of the project
- Some things about the project
Use the Use this template button to create a new repository from this one
Tested on:
- Ubuntu 18.04 with Nvidia K80
- Azure Data Science Virtual machine template
- Ubuntu 18.04 with Nvidia V100
- Ubuntu 20.04 with Nvidia GEFORCE RTX 2070 SUPER
- Ubuntu 18.04 with Nvidia GEFORCE 3070
- Ubuntu 20.04 with Nvidia GEFORCE 3070
- Macbook Pro 2015
- Macbook Pro 2017
NOTE: Both setups assume that your your CUDA and GPU drivers work if not check troubleshooting below
- Python3
- Miniconda
- Initialize conda environment
conda create -n <env name>
- Activate conda environment
conda activate <env name>
- To install PyTorch for your OS follow instructions here (they keep changing it so it's hard to standardize)
- Installing packages
conda install -c pytorch -c conda-forge -c bioconda --file=requirements.txt
- If the above doesn't work then try:
pip intall --user -r requirements.txt
- Setup DVC and other libraries
chmod 755 setup.sh; ./setup.sh
- Install requirements
pip install --user -r requirements.txt
- Setup DVC and other libraries
chmod 755 setup.sh; ./setup.sh
- Data versioning: DVC
- Model versioning + Training monitoring + Hyperparameter sweeps: Weights and Biases - Please signup first
- Distributed training: Weights and Biases
- Coding: JupyterLab or if you want free GPUs and data privacy isn't an issue Colab
- Training framework: Pytorch + PyTorchLightning
- Tabular data management: Pandas
- Plotting: Matplotlib + Plotly
- Deployment: FastAPI
Training models
- The
train.py
script contains an outline to effectively use and train PyTorch models using PyTorch Lightning. Once you've customized the script you can run it by doingpython3 main.py --train
- You can find more tutorials for customizing stuff in the PyTorch lighting section below.
DVC
- DVC will be initalized by default in the root directory via the setup.sh
- Rest of the data versioning guide can be found here: https://dvc.org/doc/start/data-versioning
PyTorch Lightning
- PyTorch Lightning will speed up a lot of your PyTorch workflow especially in the training phase.
- Overview
- Setting PyTorch Lightning to use in-line arguments e.g.
python main.py --gpus 2 --max_steps 10 --limit_train_batches 10 --any_trainer_arg x
- Turn your existing models into Lightning models
- Setting up mid-training checkpoints
- Auto Learning Rate Finder
- Other advanced guides for things like multi-gpu training
Weights and Biases
- To setup Weights and biases first sign up and create a project or get added to the project
- Login via terminal and add Weights and Biases to start. Instructions here
- How to log via PyTorch lightning to W&B?
- How to integrate Ray Tune with Weights and Biases?
- How to use built-in hyperparameter sweep functionality?
Jupyter Lab
- Starting Jupyterlab
jupyter lab
- User guide, key mapping and shortcut customization
Deploying models with ease
- In the deployment section there is a pre-built FastAPI template that will allow you to import a model, set the correct data type and return predictions
- Once you've customized it acoording to your needs you can just run
python3 main.py --deploy
- There's also a customizable Dockerfile to make a Docker image for it
No Python3 Kernel in Jupyter Lab
Solve by doing the following:
python3 -m pip install ipykernel
python3 -m ipykernel install --user
- Restart Jupyter Lab
DVC/Tensorboard/JupterLab command not found
python3 -m <command here>
- Issue with python versions on the machine
Can't access jupyter from local machine after cloud deployment
- Run Jupyter Lab
jupyter lab --no-browser --ip=0.0.0.0 --port=8888
GPU driver/ tensorflow-gpu issues
- Check which GPU you have on the machine and make sure it's supported
- Install CUDA if you don't have it installed the best way to do this is to just do
conda install tensorflow-gpu
and it should take care of itself. 3.If the above doesn't work check if you have installed the correct version of tensorflow-gpu for installed CUDA drivers.nvidia-smi
- Check which version of GPU drivers are needed for the GPU and the supported CUDA versions for the drivers from here.
- When you have found the right version follow this guide and replace the CUDA driver files in the commands with the right versions.
- If nothing works try a pre-built VM template on the cloud