Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware Auto-Setup Example/Tutorial for Distributed Launch #1227

Merged
merged 4 commits into from
Mar 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ doc-builder preview {package_name} {path_to_docs}
For example:

```bash
doc-builder preview transformers docs/source/en/
doc-builder preview accelerate docs/source/
```

The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
Expand Down
17 changes: 16 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,22 @@ To run it in each of these various modes, use the following commands:

### Using AWS SageMaker integration
- [Examples showcasing AWS SageMaker integration of 🤗 Accelerate.](https://github.com/pacman100/accelerate-aws-sagemaker)



## Simple Multi-GPU Hardware Launcher

[multigpu_remote_launcher.py](./multigpu_remote_launcher.py) is a minimal script that demonstrates launching accelerate
on multiple remote GPUs, and with automatic hardware environment and dependency setup for reproducibility. You can
easily customize the training function used, training arguments, hyperparameters, and type of compute hardware, and then
run the script to automatically launch multi GPU training on remote hardware.

This script uses [Runhouse](https://github.com/run-house/runhouse) to launch on self-hosted hardware (e.g. in your own
cloud account or on-premise cluster) but there are other options for running remotely as well. Runhouse can be installed
with `pip install runhouse`, and you can refer to
[hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/main/rh_primitives/cluster.html#hardware-setup)
for hardware setup instructions, or this
[Colab tutorial](https://colab.research.google.com/drive/1qVwYyLTCPYPSdz9ZX7BZl9Qm0A3j7RJe) for a more in-depth walkthrough.

## Finer Examples

While the first two scripts are extremely barebones when it comes to what you can do with accelerate, more advanced features are documented in two other locations.
Expand Down
55 changes: 55 additions & 0 deletions examples/multigpu_remote_launcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import argparse

import runhouse as rh
import torch
from nlp_example import training_function

from accelerate.utils import PrepareForLaunch, patch_environment


def launch_train(*args):
num_processes = torch.cuda.device_count()
print(f"Device count: {num_processes}")
with patch_environment(
world_size=num_processes, master_addr="127.0.01", master_port="29500", mixed_precision=args[1].mixed_precision
):
launcher = PrepareForLaunch(training_function, distributed_type="MULTI_GPU")
torch.multiprocessing.start_processes(launcher, args=args, nprocs=num_processes, start_method="spawn")


if __name__ == "__main__":
# Refer to https://runhouse-docs.readthedocs-hosted.com/en/main/rh_primitives/cluster.html#hardware-setup
# for cloud access setup instructions (if using on-demand hardware), and for API specifications.

# on-demand GPU
# gpu = rh.cluster(name='rh-cluster', instance_type='V100:1', provider='cheapest', use_spot=False) # single GPU
gpu = rh.cluster(name="rh-cluster", instance_type="V100:4", provider="cheapest", use_spot=False) # multi GPU
gpu.up_if_not()

# on-prem GPU
# gpu = rh.cluster(
# ips=["ip_addr"], ssh_creds={ssh_user:"<username>", ssh_private_key:"<key_path>"}, name="rh-cluster"
# )

# Set up remote function
reqs = [
"pip:./",
"transformers",
"datasets",
"evaluate",
"tqdm",
"scipy",
"scikit-learn",
"tensorboard",
"torch --upgrade --extra-index-url https://download.pytorch.org/whl/cu117",
]
launch_train_gpu = rh.function(fn=launch_train, system=gpu, reqs=reqs, name="train_bert_glue")

# Define train args/config, run train function
train_args = argparse.Namespace(cpu=False, mixed_precision="fp16")
config = {"lr": 2e-5, "num_epochs": 3, "seed": 42, "batch_size": 16}
launch_train_gpu(config, train_args, stream_logs=True)

# Alternatively, we can just run as instructed in the README (but only because there's already a wrapper CLI):
# gpu.install_packages(reqs)
# gpu.run(['accelerate launch --multi_gpu accelerate/examples/nlp_example.py'])