diff --git a/README.md b/README.md index baa7c6bc4..18382906f 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ creating detection and segmentation models using PyTorch 1.0. - **Very fast**: up to **2x** faster than [Detectron](https://github.com/facebookresearch/Detectron) and **30%** faster than [mmdetection](https://github.com/open-mmlab/mmdetection) during training. See [MODEL_ZOO.md](MODEL_ZOO.md) for more details. - **Memory efficient:** uses roughly 500MB less GPU memory than mmdetection during training - **Multi-GPU training and inference** +- **Mixed precision training:** trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores). - **Batched inference:** can perform inference using multiple images per batch per GPU - **CPU support for inference:** runs on CPU in inference time. See our [webcam demo](demo) for an example - Provides pre-trained models for almost all reference Mask R-CNN and Faster R-CNN configurations with 1x schedule. @@ -152,6 +153,15 @@ python -m torch.distributed.launch --nproc_per_node=$NGPUS /path_to_maskrcnn_ben ``` Note we should set `MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN` follow the rule in Single-GPU training. +### Mixed precision training +We currently use [APEX](https://github.com/NVIDIA/apex) to add [Automatic Mixed Precision](https://developer.nvidia.com/automatic-mixed-precision) support. To enable, just do Single-GPU or Multi-GPU training and set `DTYPE "float16"`. + +```bash +export NGPUS=8 +python -m torch.distributed.launch --nproc_per_node=$NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py --config-file "path/to/config/file.yaml" MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN images_per_gpu x 1000 DTYPE "float16" +``` +If you want more verbose logging, set `AMP_VERBOSE True`. See [Mixed Precision Training guide](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html) for more details. + ## Evaluation You can test your model directly on single or multiple gpus. Here is an example for Mask R-CNN R-50 FPN with the 1x schedule on 8 GPUS: ```bash