huggingface · muellerzr · Sep 11, 2024 · Sep 11, 2024 · Sep 11, 2024
diff --git a/docs/source/usage_guides/distributed_inference.md b/docs/source/usage_guides/distributed_inference.md
@@ -148,7 +148,7 @@ This next part will discuss using *pipeline parallelism*. This is an **experimen
 
 The general idea with pipeline parallelism is: say you have 4 GPUs and a model big enough it can be *split* on four GPUs using `device_map="auto"`. With this method you can send in 4 inputs at a time (for example here, any amount works) and each model chunk will work on an input, then receive the next input once the prior chunk finished, making it *much* more efficient **and faster** than the method described earlier. Here's a visual taken from the PyTorch repository:
 
-![PiPPy example](https://camo.githubusercontent.com/681d7f415d6142face9dd1b837bdb2e340e5e01a58c3a4b119dea6c0d99e2ce0/68747470733a2f2f692e696d6775722e636f6d2f657955633934372e706e67)
+![Pipeline parallelism example](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/accelerate/pipeline_parallel.png)
 
 To illustrate how you can use this with Accelerate, we have created an [example zoo](https://github.com/huggingface/accelerate/tree/main/examples/inference) showcasing a number of different models and situations. In this tutorial, we'll show this method for GPT2 across two GPUs.
 
@@ -168,7 +168,7 @@ model = GPT2ForSequenceClassification(config)
 model.eval()
 ```
 
-Next you'll need to create some example inputs to use. These help PiPPy trace the model.
+Next you'll need to create some example inputs to use. These help `torch.distributed.pipelining` trace the model.
 
 <Tip warning={true}>
     However you make this example will determine the relative batch size that will be used/passed