Name		Name	Last commit message	Last commit date
parent directory ..
distributed_training		distributed_training
figures		figures
monitoring		monitoring
README.md		README.md
TensorRT_inference_acceleration.ipynb		TensorRT_inference_acceleration.ipynb
automatic_mixed_precision.ipynb		automatic_mixed_precision.ipynb
dataset_type_performance.ipynb		dataset_type_performance.ipynb
fast_model_training_guide.md		fast_model_training_guide.md
fast_training_tutorial.ipynb		fast_training_tutorial.ipynb
threadbuffer_performance.ipynb		threadbuffer_performance.ipynb
transform_speed.ipynb		transform_speed.ipynb

README.md

Performance optimization and GPU acceleration

Typically, model training is a time-consuming step during deep learning development, especially in medical imaging applications. Volumetric medical images are usually large (as multi-dimensional arrays) and the model training process can be complex. Even with powerful hardware (e.g. CPU/GPU with large RAM), it is not easy to fully leverage them to achieve high performance. NVIDIA GPUs have been widely applied in many areas of deep learning training and evaluation, and the CUDA parallel computation shows obvious acceleration when comparing to traditional computation methods. To fully leverage GPU features, many popular mechanisms raised, like automatic mixed precision (AMP), distributed data parallel, etc. MONAI can support these features and this folder provides a fast training guide to achieve the best performance and rich examples.

List of notebooks and examples

fast_model_training_guide

The document introduces details of how to profile the training pipeline, how to analyze the dataset and select suitable algorithms, and how to optimize GPU utilization in single GPU, multi-GPUs or even multi-nodes.

distributed_training

The examples show how to execute distributed training and evaluation based on 3 different frameworks:

PyTorch native DistributedDataParallel module with torchrun.
Horovod APIs with horovodrun.
PyTorch ignite and MONAI workflows.

They can run on several distributed nodes with multiple GPU devices on every node.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acceleration

acceleration

README.md

Performance optimization and GPU acceleration

List of notebooks and examples

fast_model_training_guide

distributed_training

automatic_mixed_precision

dataset_type_performance

fast_training_tutorial

threadbuffer_performance

transform_speed

TensorRT_inference_acceleration

Tutorials for resource monitoring

Files

acceleration

Directory actions

More options

Directory actions

More options

Latest commit

History

acceleration

Folders and files

parent directory

Performance optimization and GPU acceleration

List of notebooks and examples