Skip to content

Commit

Permalink
improve docs
Browse files Browse the repository at this point in the history
  • Loading branch information
a-r-r-o-w committed Feb 4, 2025
1 parent 8f10d05 commit 06b411f
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 7 deletions.
4 changes: 2 additions & 2 deletions docs/source/en/optimization/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,9 +160,9 @@ In order to properly offload models after they're called, it is required to run

## Group offloading

Group offloading is a middle ground between the two above methods. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method is more memory-efficient than model-level offloading. It is also faster than sequential-level offloading, as the number of device synchronizations is reduced.
Group offloading is a middle ground between the two above methods. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method uses lower memory than model-level offloading. It is also faster than sequential-level offloading, as the number of device synchronizations is reduced.

Another supported feature (for CUDA devices with support for asynchronous data transfer streams) is the ability to overlap data transfer and computation to reduce the overall execution time. This is enabled using layer prefetching with CUDA streams, i.e., the layer that is to be executed next starts onloading to the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Note that this implementation also supports leaf-level offloading but can be made much faster when using streams.
Another supported feature (for CUDA devices with support for asynchronous data transfer streams) is the ability to overlap data transfer and computation to reduce the overall execution time compared to sequential offloading. This is enabled using layer prefetching with CUDA streams, i.e., the layer that is to be executed next starts onloading to the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Note that this implementation also supports leaf-level offloading but can be made much faster when using streams.

To enable group offloading, either call the [`~ModelMixin.enable_group_offloading`] method on the model or pass use [`~hooks.group_offloading.apply_group_offloading`]:

Expand Down
10 changes: 5 additions & 5 deletions src/diffusers/hooks/group_offloading.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,14 +285,14 @@ def apply_group_offloading(
memory, but can be slower due to the excessive number of device synchronizations.
Group offloading is a middle ground between the two methods. It works by offloading groups of internal layers,
(either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method is more memory-efficient than module-level
(either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method uses lower memory than module-level
offloading. It is also faster than leaf-level offloading, as the number of device synchronizations is reduced.
Another supported feature (for CUDA devices with support for asynchronous data transfer streams) is the ability to
overlap data transfer and computation to reduce the overall execution time. This is enabled using layer prefetching
with streams, i.e., the layer that is to be executed next starts onloading to the accelerator device while the
current layer is being executed - this increases the memory requirements slightly. Note that this implementation
also supports leaf-level offloading but can be made much faster when using streams.
overlap data transfer and computation to reduce the overall execution time compared to sequential offloading. This
is enabled using layer prefetching with streams, i.e., the layer that is to be executed next starts onloading to
the accelerator device while the current layer is being executed - this increases the memory requirements slightly.
Note that this implementation also supports leaf-level offloading but can be made much faster when using streams.
Args:
module (`torch.nn.Module`):
Expand Down

0 comments on commit 06b411f

Please sign in to comment.