Hardware Auto-Setup Example/Tutorial for Distributed Launch #1227

carolineechen · 2023-03-22T17:25:05Z

Previously discussed with @sgugger and @LysandreJik (cc @dongreenberg), to add example with hardware auto-setup and remote hardware distributed launching for accelerate.

These examples allow users to run accelerate code, tutorials, and scripts on self hosted hardware, including installation of dependencies/environment setup, and optionally, on-demand allocation for cloud instances. This enables reproducibility across any hardware used.

[see also huggingface/transformers/pull/22319 for more context; parallel PR for transformers]

This PR adds

example multigpu launch script for nlp_example (remote hardware, with auto env dependency setup) + corresponding section in examples readme
tutorial in HF docs on how to auto setup and launch distributed code on remote hardware, along with link out to full Colab
small fix in docs build instructions

Let me know of any feedback, happy to iterate on this!

add multi gpu launch script add auto setup hardware docs remove an example tiny fixes

HuggingFaceDocBuilderDev · 2023-03-22T18:02:05Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks a lot for your PR. I'll echo the comment I made on the Transformers PR: while we're super happy to showcase RunHouse examples, I feel having one page in the doc is maybe a step too far at this stage. Could we focus first on:

having one example as you did
say a bit more clearly in the README that RunHouse is one way to run the examples remotely (and there might be others)
make launch_auto_hardware.mdx a notebook you host on your side and link to it from the examples/README.md

carolineechen · 2023-03-23T19:23:26Z

Hi @sgugger, thanks for the feedback! I've updated this PR to reflect the comments, let me know if there's anything else

sgugger

Thanks a lot!

carolineechen added 2 commits March 22, 2023 13:17

add self hosted hardware example

ee4a93d

add multi gpu launch script add auto setup hardware docs remove an example tiny fixes

add colab link

7436e14

style

4005c94

sgugger reviewed Mar 23, 2023

View reviewed changes

update readme, remove docs page

4e374e2

carolineechen force-pushed the hardware-autosetup branch from 13efe64 to 4e374e2 Compare March 23, 2023 19:21

sgugger approved these changes Mar 24, 2023

View reviewed changes

sgugger merged commit 1fe27e7 into huggingface:main Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware Auto-Setup Example/Tutorial for Distributed Launch #1227

Hardware Auto-Setup Example/Tutorial for Distributed Launch #1227

carolineechen commented Mar 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 22, 2023 •

edited

Loading

sgugger left a comment

carolineechen commented Mar 23, 2023

sgugger left a comment

Hardware Auto-Setup Example/Tutorial for Distributed Launch #1227

Hardware Auto-Setup Example/Tutorial for Distributed Launch #1227

Conversation

carolineechen commented Mar 22, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Mar 22, 2023 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

carolineechen commented Mar 23, 2023

sgugger left a comment

Choose a reason for hiding this comment

carolineechen commented Mar 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 22, 2023 •

edited

Loading