diff --git a/docs/images/nsight_comparison.png b/docs/images/nsight_comparison.png new file mode 100644 index 0000000000..9b91826513 Binary files /dev/null and b/docs/images/nsight_comparison.png differ diff --git a/docs/source/whatsnew.rst b/docs/source/whatsnew.rst index daed871e14..e1f118cdf6 100644 --- a/docs/source/whatsnew.rst +++ b/docs/source/whatsnew.rst @@ -6,5 +6,6 @@ What's New .. toctree:: :maxdepth: 1 + whatsnew_0_7.md whatsnew_0_6.md whatsnew_0_5.md diff --git a/docs/source/whatsnew_0_6.md b/docs/source/whatsnew_0_6.md index bdc419df37..8df0503142 100644 --- a/docs/source/whatsnew_0_6.md +++ b/docs/source/whatsnew_0_6.md @@ -1,4 +1,4 @@ -# What's new in 0.6 🎉🎉 +# What's new in 0.6 - Decollating mini-batches as an essential post-processing step - Pythonic APIs to load the pretrained models from Clara Train MMARs diff --git a/docs/source/whatsnew_0_7.md b/docs/source/whatsnew_0_7.md new file mode 100644 index 0000000000..8d0f3947f7 --- /dev/null +++ b/docs/source/whatsnew_0_7.md @@ -0,0 +1,62 @@ +# What's new in 0.7 🎉🎉 + +- Performance enhancements with profiling and tuning guides +- Major usability improvements in `monai.transforms` +- Reimplementing state-of-the-art Kaggle solutions +- Vision-language multimodal transformer architectures + +## Performance enhancements with profiling and tuning guides + +Model training is often a time-consuming step during deep learning development, +especially for medical imaging applications. Even with powerful hardware (e.g. +CPU/GPU with large RAM), the workflows often require careful profiling and +tuning to achieve high performance. MONAI has been focusing on performance +enhancements, and in this version, [a fast model training +guide](https://github.com/Project-MONAI/tutorials/blob/master/acceleration/fast_model_training_guide.md) +is provided to help build highly performant workflows, with a comprehensive +overview of the profiling tools and practical strategies. The following figure +shows the use of [Nvidia Nsight™ Systems](https://developer.nvidia.com/nsight-systems) for system-wide performance analysis during +a performance enhancement study. +![nsight_vis](../images/nsight_comparison.png) + +With the performance profiling and enhancements, several typical use cases were studied to +improve the training efficiency. The following figure shows that fast +training using MONAI can be 20 times faster than a regular baseline ([learn +more](https://github.com/Project-MONAI/tutorials/blob/master/acceleration/fast_training_tutorial.ipynb)). +![fast_training](../images/fast_training.png) + +## Major usability improvements in `monai.transforms` for NumPy/PyTorch inputs and backends + + MONAI starts to roll out major usability enhancements for the + `monai.transforms` module. Many transforms are now supporting both NumPy and + PyTorch, as input types and computational backends. + +One benefit of these enhancements is that the users can now better leverage the +GPUs for preprocessing. By transferring the input data onto GPU using +`ToTensor` or `EnsureType`, and applying the GPU-based transforms to the data, +[the tutorial of spleen +segmentation](https://github.com/Project-MONAI/tutorials/blob/master/acceleration/fast_training_tutorial.ipynb) +shows the great potential of using the flexible modules for fast and efficient +training. + +## Reimplementing state-of-the-art Kaggle solutions + +With this release, we actively evaluate and enhance the quality and flexibility +of the MONAI core modules, using the public Kaggle challenge as a testbed. [A +reimplementation](https://github.com/Project-MONAI/tutorials/tree/master/kaggle/RANZCR/4th_place_solution) +of a state-of-the-art solution at [Kaggle RANZCR CLiP - Catheter and Line +Position +Challenge](https://www.kaggle.com/c/ranzcr-clip-catheter-line-classification) +is made available in this version. + +## Vision-language multimodal transformers + +In this release, MONAI adds support for training multimodal (vision + language) +transformers that can handle both image and textual data. MONAI introduces the +`TransCheX` model which consists of vision, language, and mixed-modality +transformer layers for processing chest X-ray and their corresponding +radiological reports within a unified framework. In addition to `TransCheX`, +users have the flexibility to alter the architecture by varying the number of +vision, language and mixed-modality layers and customizing the classification +head. In addition, the model can be initialized from pre-trained BERT language +models for fine-tuning.