This is a PyTorch implementation of the UNet architecture outlined in Ronneberger et al. (2015), adapted to perform semantic image segmentation on the Pascal VOC dataset. For a more detailed introduction, see https://github.com/kevinddchen/Keras-FCN.
To use out-of-the-box, run the following:
git clone git@github.com:kevinddchen/Pytorch-UNet.git
cd Pytorch-UNet
git lfs pull
Copy the images you want segmented into a new directory called eval/images/
.
Create the conda environment in environment.yml.
Then run
python eval.py
The labels will be saved to eval/labels/
.
The model is in model.py, and the training details are in train.py.
We train on the 11,355 labelled images in the Berkeley Segmentation Boundaries Dataset (SBD) and validate on the 676 labelled images in the original Pascal VOC dataset that are missing from the SBD.
If you want to duplicate our dataset, download the val/ folder of this repository using the command gif lfs pull
.
Then, download the SBD dataset from their website and place the contents of benchmark_RELEASE/dataset/img/
into a new folder called train/images/
, and benchmark_RELEASE/dataset/cls/
into train/labels/
.
The dataset is augmented by random scaling, rotation, cropping, and jittering of the RBG values. Details are in utils.py To train, run the command,
python train.py
The encoder weights are initialized with the VGG13 pretrained weights. Training took 6 hours on a T4 GPU. For comparison, we also trained a FCN using the VGG13 backbone, which has 10x the number of parameters. That model can be found on the branch fcn.
Below are some predicted labels in the validation set.
Image | Truth | UNet | FCN |
---|---|---|---|
The performance of these models on the validation set are summarized below.
Model | UNet | FCN |
---|---|---|
Pixel accuracy | 0.878 | 0.903 |
Mean IoU | 0.490 | 0.583 |