From 0642006437bc35d5f444d755b9a884034a7f0d8c Mon Sep 17 00:00:00 2001 From: "[[ -z $EMAIL ]] && read -e -p \"Enter your email (for git configuration): \" EMAIL" Date: Wed, 11 Sep 2024 15:13:55 -0400 Subject: [PATCH 1/2] Update image --- docs/source/usage_guides/distributed_inference.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/usage_guides/distributed_inference.md b/docs/source/usage_guides/distributed_inference.md index 82fdc21031d..3d9088421cf 100644 --- a/docs/source/usage_guides/distributed_inference.md +++ b/docs/source/usage_guides/distributed_inference.md @@ -148,7 +148,7 @@ This next part will discuss using *pipeline parallelism*. This is an **experimen The general idea with pipeline parallelism is: say you have 4 GPUs and a model big enough it can be *split* on four GPUs using `device_map="auto"`. With this method you can send in 4 inputs at a time (for example here, any amount works) and each model chunk will work on an input, then receive the next input once the prior chunk finished, making it *much* more efficient **and faster** than the method described earlier. Here's a visual taken from the PyTorch repository: -![PiPPy example](https://camo.githubusercontent.com/681d7f415d6142face9dd1b837bdb2e340e5e01a58c3a4b119dea6c0d99e2ce0/68747470733a2f2f692e696d6775722e636f6d2f657955633934372e706e67) +![Pipeline parallelism example](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/accelerate/pipeline_parallel.png) To illustrate how you can use this with Accelerate, we have created an [example zoo](https://github.com/huggingface/accelerate/tree/main/examples/inference) showcasing a number of different models and situations. In this tutorial, we'll show this method for GPT2 across two GPUs. From 4c8dd872ce45e043f4a983bbfb5ff89c405d1e0d Mon Sep 17 00:00:00 2001 From: "[[ -z $EMAIL ]] && read -e -p \"Enter your email (for git configuration): \" EMAIL" Date: Wed, 11 Sep 2024 15:14:29 -0400 Subject: [PATCH 2/2] Fin --- docs/source/usage_guides/distributed_inference.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/usage_guides/distributed_inference.md b/docs/source/usage_guides/distributed_inference.md index 3d9088421cf..4e9c9c6a947 100644 --- a/docs/source/usage_guides/distributed_inference.md +++ b/docs/source/usage_guides/distributed_inference.md @@ -168,7 +168,7 @@ model = GPT2ForSequenceClassification(config) model.eval() ``` -Next you'll need to create some example inputs to use. These help PiPPy trace the model. +Next you'll need to create some example inputs to use. These help `torch.distributed.pipelining` trace the model. However you make this example will determine the relative batch size that will be used/passed