A PyTorch implementation of Fast-SCNN based on BMVC 2019 paper Fast-SCNN: Fast Semantic Segmentation Network.
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
- thop
pip install thop
- opencv
pip install opencv-python
- cityscapesScripts
pip install git+https://github.com/mcordts/cityscapesScripts.git
cityscapes/
gtFine/
train/
aachen/
color.png, instanceIds.png, labelIds.png, polygons.json,
labelTrainIds.png
...
val/
leftImg8bit/
train/
val/
Set environment variable CITYSCAPES_DATASET
firstly, for example:
export CITYSCAPES_DATASET=/home/data/cityscapes
and then run createTrainIdLabelImgs.py to creat labelTrainIds.png
.
python train.py --crop_h 512 --crop_w 1024
optional arguments:
--data_path Data path for cityscapes dataset [default value is '/home/data/cityscapes']
--crop_h Crop height for training images [default value is 1024]
--crop_w Crop width for training images [default value is 2048]
--batch_size Number of data for each batch to train [default value is 12]
--save_step Number of steps to save predicted results [default value is 5]
--epochs Number of sweeps over the dataset to train [default value is 100]
Set environment variable CITYSCAPES_DATASET
and CITYSCAPES_RESULTS
firstly, for example:
export CITYSCAPES_DATASET=/home/data/cityscapes
export CITYSCAPES_RESULTS=/home/code/Fast-SCNN/results
and then run evalPixelLevelSemanticLabeling.py to eval the predicted segmentation.
python viewer.py --model_weight 512_1024_model.pth
optional arguments:
--data_path Data path for cityscapes dataset [default value is '/home/data/cityscapes']
--model_weight Pretrained model weight [default value is '1024_2048_model.pth']
--input_pic Path to the input picture [default value is 'test/berlin/berlin_000000_000019_leftImg8bit.png']
The experiment is conducted on one NVIDIA Tesla V100 (32G) GPU, and there are some difference between this implementation and official implementation:
- The scales of
Multi-Scale Training
are(0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0)
; - No
color channels noise and brightness
used; - No
auxiliary losses
at the end oflearning to downsample
and theglobal feature extraction modules
used; - The training
epochs
is100
; Adam
optimizer with learning rate1e-3
is used to train this model;- No
Polynomial Learning Scheduler
used.
Params (M) | FLOPs (G) | FPS | Pixel Accuracy | Class mIOU | Category mIOU | Download |
---|---|---|---|---|---|---|
1.14 | 6.92 | 197 | 81.8 | 58.0 | 81.7 | model | eg6a |
The left is input image, the middle is ground truth segmentation, and the right is model's predicted segmentation.