Skip to content

Commit

Permalink
Fixes for run_with_accelerate (#2935)
Browse files Browse the repository at this point in the history
* fixes for `run_with_accelerate`

* Auto-update of Starter template

* warn on bad inputs

* process num_processes as usual

* review suggestions

* fix yamlfix in CI

* Add accelerate docs (#2936)

* Docs on accelerate

* Apply suggestions from code review

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

---------

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

* Update docs/book/how-to/training-with-gpus/accelerate-distributed-training.md

---------

Co-authored-by: GitHub Actions <actions@github.com>
Co-authored-by: Hamza Tahir <hamza@zenml.io>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
  • Loading branch information
4 people authored Aug 19, 2024
1 parent 65776bc commit 4722acc
Show file tree
Hide file tree
Showing 5 changed files with 238 additions and 80 deletions.
115 changes: 115 additions & 0 deletions docs/book/how-to/training-with-gpus/accelerate-distributed-training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
description: Run distributed training with Hugging Face's Accelerate library in ZenML pipelines.
---

# Distributed training with 🤗 Accelerate

There are several reasons why you might want to scale your machine learning pipelines to utilize distributed training, such as leveraging multiple GPUs or training across multiple nodes. ZenML now integrates with [Hugging Face's Accelerate library](https://github.com/huggingface/accelerate) to make this process seamless and efficient.

## Use 🤗 Accelerate in your steps

Some steps in your machine learning pipeline, particularly training steps, can benefit from distributed execution. You can now use the `run_with_accelerate` function to enable this:

```python
from zenml import step, pipeline
from zenml.integrations.huggingface.steps import run_with_accelerate

@step
def training_step():
# your training code here
...

@pipeline
def training_pipeline(num_processes: int):
run_with_accelerate(training_step, num_processes=num_processes)()
```

The `run_with_accelerate` function wraps your step, enabling it to run with Accelerate's distributed training capabilities. It accepts various arguments that correspond to Accelerate CLI options.

{% hint style="info" %}
For a complete list of available arguments and more details, refer to the [Accelerate CLI documentation](https://huggingface.co/docs/accelerate/en/package_reference/cli#accelerate-launch).
{% endhint %}

### Configuration

The `run_with_accelerate` function accepts various arguments to configure your distributed training environment. Some common arguments include:

- `num_processes`: The number of processes to use for distributed training.
- `cpu`: Whether to force training on CPU.
- `multi_gpu`: Whether to launch distributed GPU training.
- `mixed_precision`: Mixed precision training mode ('no', 'fp16', or 'bf16').

### Important Usage Notes

1. The `run_with_accelerate` function cannot be used directly on steps using the '@' syntax. Use it within your pipeline definition instead.

2. Steps defined inside the entrypoint script cannot be used with `run_with_accelerate`. Move your step code to another file and import it.

3. Accelerated steps do not support positional arguments. Use keyword arguments when calling your steps.

4. If `run_with_accelerate` is misused, it will raise a `RuntimeError` with a helpful message explaining the correct usage.

{% hint style="info" %}
To see a full example where Accelerate is used within a ZenML pipeline, check out our <a href="https://github.com/zenml-io/zenml-projects/blob/main/llm-lora-finetuning/README.md">llm-lora-finetuning</a> project which leverages the distributed training functionalities while finetuning an LLM.
{% endhint %}

## Ensure your container is Accelerate-ready

To run steps with Accelerate, it's crucial to have the necessary dependencies installed in the environment. This section will guide you on how to configure your environment to utilize Accelerate effectively.

{% hint style="warning" %}
Note that these configuration changes are **required** for Accelerate to function properly. If you don't update the settings, your steps might run, but they will not leverage distributed training capabilities.
{% endhint %}

All steps using Accelerate will be executed within a containerized environment. Therefore, you need to make two amendments to your Docker settings for the relevant steps:

### 1. Specify a CUDA-enabled parent image in your `DockerSettings`

For complete details, refer to the [containerization page](../customize-docker-builds/README.md). Here's an example using a CUDA-enabled PyTorch image:

```python
from zenml import pipeline
from zenml.config import DockerSettings

docker_settings = DockerSettings(parent_image="pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime")

@pipeline(settings={"docker": docker_settings})
def my_pipeline(...):
...
```

### 2. Add Accelerate as explicit pip requirements

Ensure that Accelerate is installed in your container:

```python
from zenml.config import DockerSettings
from zenml import pipeline

docker_settings = DockerSettings(
parent_image="pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime",
requirements=["accelerate", "torchvision"]
)

@pipeline(settings={"docker": docker_settings})
def my_pipeline(...):
...
```

## Train across multiple GPUs

ZenML's Accelerate integration supports training your models with multiple GPUs on a single node or across multiple nodes. This is particularly useful for large datasets or complex models that benefit from parallelization.

In practice, using Accelerate with multiple GPUs involves:

- Wrapping your training step with the `run_with_accelerate` function in your pipeline definition
- Configuring the appropriate Accelerate arguments (e.g., `num_processes`, `multi_gpu`)
- Ensuring your training code is compatible with distributed training (Accelerate handles most of this automatically)

{% hint style="info" %}
If you're new to distributed training or encountering issues, please [connect with us on Slack](https://zenml.io/slack) and we'll be happy to assist you.
{% endhint %}

By leveraging the Accelerate integration in ZenML, you can easily scale your training processes and make the most of your available hardware resources, all while maintaining the structure and benefits of your ZenML pipelines.

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
1 change: 1 addition & 0 deletions docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@
* [Implement a custom stack component](how-to/stack-deployment/implement-a-custom-stack-component.md)
* [Implement a custom integration](how-to/stack-deployment/implement-a-custom-integration.md)
* [🚜 Train with GPUs](how-to/training-with-gpus/training-with-gpus.md)
* [Distributed Training with 🤗 Accelerate](how-to/training-with-gpus/accelerate-distributed-training.md)
* [🌲 Control logging](how-to/control-logging/README.md)
* [View logs on the dashboard](how-to/control-logging/view-logs-on-the-dasbhoard.md)
* [Enable or disable logs storage](how-to/control-logging/enable-or-disable-logs-storing.md)
Expand Down
Loading

0 comments on commit 4722acc

Please sign in to comment.