From 0642006437bc35d5f444d755b9a884034a7f0d8c Mon Sep 17 00:00:00 2001
From: "[[ -z $EMAIL ]] && read -e -p \"Enter your email (for git
 configuration): \" EMAIL" <muellerzr@gmail.com>
Date: Wed, 11 Sep 2024 15:13:55 -0400
Subject: [PATCH 1/2] Update image

---
 docs/source/usage_guides/distributed_inference.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/source/usage_guides/distributed_inference.md b/docs/source/usage_guides/distributed_inference.md
index 82fdc21031d..3d9088421cf 100644
--- a/docs/source/usage_guides/distributed_inference.md
+++ b/docs/source/usage_guides/distributed_inference.md
@@ -148,7 +148,7 @@ This next part will discuss using *pipeline parallelism*. This is an **experimen
 
 The general idea with pipeline parallelism is: say you have 4 GPUs and a model big enough it can be *split* on four GPUs using `device_map="auto"`. With this method you can send in 4 inputs at a time (for example here, any amount works) and each model chunk will work on an input, then receive the next input once the prior chunk finished, making it *much* more efficient **and faster** than the method described earlier. Here's a visual taken from the PyTorch repository:
 
-![PiPPy example](https://camo.githubusercontent.com/681d7f415d6142face9dd1b837bdb2e340e5e01a58c3a4b119dea6c0d99e2ce0/68747470733a2f2f692e696d6775722e636f6d2f657955633934372e706e67)
+![Pipeline parallelism example](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/accelerate/pipeline_parallel.png)
 
 To illustrate how you can use this with Accelerate, we have created an [example zoo](https://github.com/huggingface/accelerate/tree/main/examples/inference) showcasing a number of different models and situations. In this tutorial, we'll show this method for GPT2 across two GPUs.
 

From 4c8dd872ce45e043f4a983bbfb5ff89c405d1e0d Mon Sep 17 00:00:00 2001
From: "[[ -z $EMAIL ]] && read -e -p \"Enter your email (for git
 configuration): \" EMAIL" <muellerzr@gmail.com>
Date: Wed, 11 Sep 2024 15:14:29 -0400
Subject: [PATCH 2/2] Fin

---
 docs/source/usage_guides/distributed_inference.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/usage_guides/distributed_inference.md b/docs/source/usage_guides/distributed_inference.md
index 3d9088421cf..4e9c9c6a947 100644
--- a/docs/source/usage_guides/distributed_inference.md
+++ b/docs/source/usage_guides/distributed_inference.md
@@ -168,7 +168,7 @@ model = GPT2ForSequenceClassification(config)
 model.eval()
 ```
 
-Next you'll need to create some example inputs to use. These help PiPPy trace the model.
+Next you'll need to create some example inputs to use. These help `torch.distributed.pipelining` trace the model.
 
 <Tip warning={true}>
     However you make this example will determine the relative batch size that will be used/passed