huggingface · sgugger · Mar 24, 2023 · Mar 16, 2023 · Mar 22, 2023 · Mar 22, 2023
diff --git a/docs/README.md b/docs/README.md
@@ -67,7 +67,7 @@ doc-builder preview {package_name} {path_to_docs}
 For example:
 
 ```bash
-doc-builder preview transformers docs/source/en/
+doc-builder preview accelerate docs/source/
 ```
 
 The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.

diff --git a/examples/README.md b/examples/README.md
@@ -190,7 +190,22 @@ To run it in each of these various modes, use the following commands:
 
 ### Using AWS SageMaker integration
 - [Examples showcasing AWS SageMaker integration of 🤗 Accelerate.](https://github.com/pacman100/accelerate-aws-sagemaker)
-
+
+
+## Simple Multi-GPU Hardware Launcher
+
+[multigpu_remote_launcher.py](./multigpu_remote_launcher.py) is a minimal script that demonstrates launching accelerate
+on multiple remote GPUs, and with automatic hardware environment and dependency setup for reproducibility. You can
+easily customize the training function used, training arguments, hyperparameters, and type of compute hardware, and then
+run the script to automatically launch multi GPU training on remote hardware.
+
+This script uses [Runhouse](https://github.com/run-house/runhouse) to launch on self-hosted hardware (e.g. in your own
+cloud account or on-premise cluster) but there are other options for running remotely as well. Runhouse can be installed
+with `pip install runhouse`, and you can refer to
+[hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/main/rh_primitives/cluster.html#hardware-setup)
+for hardware setup instructions, or this
+[Colab tutorial](https://colab.research.google.com/drive/1qVwYyLTCPYPSdz9ZX7BZl9Qm0A3j7RJe) for a more in-depth walkthrough.
+
 ## Finer Examples
 
 While the first two scripts are extremely barebones when it comes to what you can do with accelerate, more advanced features are documented in two other locations.

diff --git a/examples/multigpu_remote_launcher.py b/examples/multigpu_remote_launcher.py
@@ -0,0 +1,55 @@
+import argparse
+
+import runhouse as rh
+import torch
+from nlp_example import training_function
+
+from accelerate.utils import PrepareForLaunch, patch_environment
+
+
+def launch_train(*args):
+    num_processes = torch.cuda.device_count()
+    print(f"Device count: {num_processes}")
+    with patch_environment(
+        world_size=num_processes, master_addr="127.0.01", master_port="29500", mixed_precision=args[1].mixed_precision
+    ):
+        launcher = PrepareForLaunch(training_function, distributed_type="MULTI_GPU")
+        torch.multiprocessing.start_processes(launcher, args=args, nprocs=num_processes, start_method="spawn")
+
+
+if __name__ == "__main__":
+    # Refer to https://runhouse-docs.readthedocs-hosted.com/en/main/rh_primitives/cluster.html#hardware-setup
+    # for cloud access setup instructions (if using on-demand hardware), and for API specifications.
+
+    # on-demand GPU
+    # gpu = rh.cluster(name='rh-cluster', instance_type='V100:1', provider='cheapest', use_spot=False)  # single GPU
+    gpu = rh.cluster(name="rh-cluster", instance_type="V100:4", provider="cheapest", use_spot=False)  # multi GPU
+    gpu.up_if_not()
+
+    # on-prem GPU
+    # gpu = rh.cluster(
+    #           ips=["ip_addr"], ssh_creds={ssh_user:"<username>", ssh_private_key:"<key_path>"}, name="rh-cluster"
+    #       )
+
+    # Set up remote function
+    reqs = [
+        "pip:./",
+        "transformers",
+        "datasets",
+        "evaluate",
+        "tqdm",
+        "scipy",
+        "scikit-learn",
+        "tensorboard",
+        "torch --upgrade --extra-index-url https://download.pytorch.org/whl/cu117",
+    ]
+    launch_train_gpu = rh.function(fn=launch_train, system=gpu, reqs=reqs, name="train_bert_glue")
+
+    # Define train args/config, run train function
+    train_args = argparse.Namespace(cpu=False, mixed_precision="fp16")
+    config = {"lr": 2e-5, "num_epochs": 3, "seed": 42, "batch_size": 16}
+    launch_train_gpu(config, train_args, stream_logs=True)
+
+    # Alternatively, we can just run as instructed in the README (but only because there's already a wrapper CLI):
+    # gpu.install_packages(reqs)
+    # gpu.run(['accelerate launch --multi_gpu accelerate/examples/nlp_example.py'])