Let's tackle the comma10k segmentation and have some fun.
First I will try to implement the following paper and see how this is going to work out. I have chosen this one because it is small (and hopefully fast) and it has good results on common benchmarks for semantic segmentation like Cityscapes and CamVid.
Moreover I will compare it to some of the older approaches like UNet or DeepLabV3 using the PyTorch Segmentation Models framework.
If you want to train your own models, please create a virtual environment. I used PyTorch 1.8.1 (torchvision 0.9.1, cuda-toolkit 11.1) which you should install first, before installing everything from requirements.txt
. The config.py
contains all hyperparameter settings for the training run, you can change the values if you want to.
python -m venv .env
source .env/bin/activate
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
git clone https://github.com/johanngerberding/comma10k-segmentation-pytorch.git
cd comma10k-segmentation-pytorch
pip install -r requirements.txt
python train.py
Down below you can see a few example predictions of the current RegSeg model, trained for 100 epochs. If you want to try it, you can download the model and the config here.
Here we have two random prediction samples from the DeepLabV3+ model after 50 epochs. You can download the model and the config file here.
mixed precision trainingevaluation (pixel accuracy, IoU, F1 Score)- more augmentations
visualization methodsadd plots of training stats- Unet++ training