-
Notifications
You must be signed in to change notification settings - Fork 459
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes for
run_with_accelerate
(#2935)
* fixes for `run_with_accelerate` * Auto-update of Starter template * warn on bad inputs * process num_processes as usual * review suggestions * fix yamlfix in CI * Add accelerate docs (#2936) * Docs on accelerate * Apply suggestions from code review Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> --------- Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Update docs/book/how-to/training-with-gpus/accelerate-distributed-training.md --------- Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Hamza Tahir <hamza@zenml.io> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
- Loading branch information
1 parent
65776bc
commit 4722acc
Showing
5 changed files
with
238 additions
and
80 deletions.
There are no files selected for viewing
115 changes: 115 additions & 0 deletions
115
docs/book/how-to/training-with-gpus/accelerate-distributed-training.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
--- | ||
description: Run distributed training with Hugging Face's Accelerate library in ZenML pipelines. | ||
--- | ||
|
||
# Distributed training with 🤗 Accelerate | ||
|
||
There are several reasons why you might want to scale your machine learning pipelines to utilize distributed training, such as leveraging multiple GPUs or training across multiple nodes. ZenML now integrates with [Hugging Face's Accelerate library](https://github.com/huggingface/accelerate) to make this process seamless and efficient. | ||
|
||
## Use 🤗 Accelerate in your steps | ||
|
||
Some steps in your machine learning pipeline, particularly training steps, can benefit from distributed execution. You can now use the `run_with_accelerate` function to enable this: | ||
|
||
```python | ||
from zenml import step, pipeline | ||
from zenml.integrations.huggingface.steps import run_with_accelerate | ||
|
||
@step | ||
def training_step(): | ||
# your training code here | ||
... | ||
|
||
@pipeline | ||
def training_pipeline(num_processes: int): | ||
run_with_accelerate(training_step, num_processes=num_processes)() | ||
``` | ||
|
||
The `run_with_accelerate` function wraps your step, enabling it to run with Accelerate's distributed training capabilities. It accepts various arguments that correspond to Accelerate CLI options. | ||
|
||
{% hint style="info" %} | ||
For a complete list of available arguments and more details, refer to the [Accelerate CLI documentation](https://huggingface.co/docs/accelerate/en/package_reference/cli#accelerate-launch). | ||
{% endhint %} | ||
|
||
### Configuration | ||
|
||
The `run_with_accelerate` function accepts various arguments to configure your distributed training environment. Some common arguments include: | ||
|
||
- `num_processes`: The number of processes to use for distributed training. | ||
- `cpu`: Whether to force training on CPU. | ||
- `multi_gpu`: Whether to launch distributed GPU training. | ||
- `mixed_precision`: Mixed precision training mode ('no', 'fp16', or 'bf16'). | ||
|
||
### Important Usage Notes | ||
|
||
1. The `run_with_accelerate` function cannot be used directly on steps using the '@' syntax. Use it within your pipeline definition instead. | ||
|
||
2. Steps defined inside the entrypoint script cannot be used with `run_with_accelerate`. Move your step code to another file and import it. | ||
|
||
3. Accelerated steps do not support positional arguments. Use keyword arguments when calling your steps. | ||
|
||
4. If `run_with_accelerate` is misused, it will raise a `RuntimeError` with a helpful message explaining the correct usage. | ||
|
||
{% hint style="info" %} | ||
To see a full example where Accelerate is used within a ZenML pipeline, check out our <a href="https://github.com/zenml-io/zenml-projects/blob/main/llm-lora-finetuning/README.md">llm-lora-finetuning</a> project which leverages the distributed training functionalities while finetuning an LLM. | ||
{% endhint %} | ||
|
||
## Ensure your container is Accelerate-ready | ||
|
||
To run steps with Accelerate, it's crucial to have the necessary dependencies installed in the environment. This section will guide you on how to configure your environment to utilize Accelerate effectively. | ||
|
||
{% hint style="warning" %} | ||
Note that these configuration changes are **required** for Accelerate to function properly. If you don't update the settings, your steps might run, but they will not leverage distributed training capabilities. | ||
{% endhint %} | ||
|
||
All steps using Accelerate will be executed within a containerized environment. Therefore, you need to make two amendments to your Docker settings for the relevant steps: | ||
|
||
### 1. Specify a CUDA-enabled parent image in your `DockerSettings` | ||
|
||
For complete details, refer to the [containerization page](../customize-docker-builds/README.md). Here's an example using a CUDA-enabled PyTorch image: | ||
|
||
```python | ||
from zenml import pipeline | ||
from zenml.config import DockerSettings | ||
|
||
docker_settings = DockerSettings(parent_image="pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime") | ||
|
||
@pipeline(settings={"docker": docker_settings}) | ||
def my_pipeline(...): | ||
... | ||
``` | ||
|
||
### 2. Add Accelerate as explicit pip requirements | ||
|
||
Ensure that Accelerate is installed in your container: | ||
|
||
```python | ||
from zenml.config import DockerSettings | ||
from zenml import pipeline | ||
|
||
docker_settings = DockerSettings( | ||
parent_image="pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime", | ||
requirements=["accelerate", "torchvision"] | ||
) | ||
|
||
@pipeline(settings={"docker": docker_settings}) | ||
def my_pipeline(...): | ||
... | ||
``` | ||
|
||
## Train across multiple GPUs | ||
|
||
ZenML's Accelerate integration supports training your models with multiple GPUs on a single node or across multiple nodes. This is particularly useful for large datasets or complex models that benefit from parallelization. | ||
|
||
In practice, using Accelerate with multiple GPUs involves: | ||
|
||
- Wrapping your training step with the `run_with_accelerate` function in your pipeline definition | ||
- Configuring the appropriate Accelerate arguments (e.g., `num_processes`, `multi_gpu`) | ||
- Ensuring your training code is compatible with distributed training (Accelerate handles most of this automatically) | ||
|
||
{% hint style="info" %} | ||
If you're new to distributed training or encountering issues, please [connect with us on Slack](https://zenml.io/slack) and we'll be happy to assist you. | ||
{% endhint %} | ||
|
||
By leveraging the Accelerate integration in ZenML, you can easily scale your training processes and make the most of your available hardware resources, all while maintaining the structure and benefits of your ZenML pipelines. | ||
|
||
<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.