A PyTorch implementation of EMANet based on ICCV 2019 paper Expectation-Maximization Attention Networks for Semantic Segmentation.
- Anaconda
- PyTorch
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
- opencv
pip install opencv-python
- tensorboard
pip install tensorboard
- pycocotools
pip install git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
- fvcore
pip install git+https://github.com/facebookresearch/fvcore
- panopticapi
pip install git+https://github.com/cocodataset/panopticapi.git
- cityscapesScripts
pip install git+https://github.com/mcordts/cityscapesScripts.git
- detectron2
pip install git+https://github.com/facebookresearch/detectron2.git@master
For a few datasets that detectron2 natively supports, the datasets are assumed to exist in a directory called
datasets/
, under the directory where you launch the program. They need to have the following directory structure:
coco/
annotations/
panoptic_{train,val}2017.json
panoptic_{train,val}2017/
# png annotations
panoptic_stuff_{train,val}2017/ # generated by the script mentioned below
run ./datasets/prepare_coco.py
to extract semantic annotations from panoptic annotations.
cityscapes/
gtFine/
train/
aachen/
color.png, instanceIds.png, labelIds.png, polygons.json,
labelTrainIds.png
...
val/
test/
leftImg8bit/
train/
val/
test/
run ./datasets/prepare_cityscapes.py
to creat labelTrainIds.png
.
Before training, the pre-trained backbone models (ResNet50,
ResNet101 and
ResNet152) on ImageNet should be downloaded
and unzipped into epochs
.
To train a model, run
python train_net.py --config-file <config.yaml>
For example, to launch end-to-end EMANet training with ResNet-50
backbone for coco
dataset on 8 GPUs, one should execute:
python train_net.py --config-file configs/r50_coco.yaml --num-gpus 8
Model evaluation can be done similarly:
python train_net.py --config-file configs/r50_coco.yaml --num-gpus 8 --eval-only MODEL.WEIGHTS epochs/model.pth
There are some difference between this implementation and official implementation:
- The image sizes of
Multi-Scale Training
are (640, 672, 704, 736, 768, 800) forcoco
dataset; - The image sizes of
Multi-Scale Training
are (800, 832, 864, 896, 928, 960, 992, 1024) forcityscapes
dataset; - No
RandomCrop
used; - Learning rate policy is
WarmupCosineLR
.
Name | train time (s/iter) | inference time (s/im) | train mem (GB) | PA % |
mean PA % | mean IoU % | FW IoU % | download link |
---|---|---|---|---|---|---|---|---|
R50 | 1.04 | 0.11 | 11.14 | 80.49 | 53.92 | 42.71 | 68.69 | model | xxi8 |
R101 | 1.55 | 0.18 | 17.92 | 81.16 | 54.54 | 43.61 | 69.50 | model | 1jhd |
R152 | 1.95 | 0.23 | 23.88 | 81.73 | 56.53 | 45.15 | 70.40 | model | wka6 |
Name | train time (s/iter) | inference time (s/im) | train mem (GB) | PA % |
mean PA % | mean IoU % | FW IoU % | download link |
---|---|---|---|---|---|---|---|---|
R50 | 0.81 | 0.11 | 11.22 | 95.13 | 80.01 | 72.28 | 91.09 | model | x2d5 |
R101 | 1.11 | 0.14 | 14.69 | 95.35 | 81.77 | 74.02 | 91.47 | model | t2m1 |
R152 | 1.37 | 0.15 | 18.87 | 95.48 | 82.97 | 75.12 | 91.68 | model | vqeq |