Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardware Auto-Setup Example/Tutorial for Distributed Launch #1227

Merged
merged 4 commits into from
Mar 24, 2023

Conversation

carolineechen
Copy link
Contributor

@carolineechen carolineechen commented Mar 22, 2023

Previously discussed with @sgugger and @LysandreJik (cc @dongreenberg), to add example with hardware auto-setup and remote hardware distributed launching for accelerate.

These examples allow users to run accelerate code, tutorials, and scripts on self hosted hardware, including installation of dependencies/environment setup, and optionally, on-demand allocation for cloud instances. This enables reproducibility across any hardware used.

[see also huggingface/transformers/pull/22319 for more context; parallel PR for transformers]

This PR adds

  • example multigpu launch script for nlp_example (remote hardware, with auto env dependency setup) + corresponding section in examples readme
  • tutorial in HF docs on how to auto setup and launch distributed code on remote hardware, along with link out to full Colab
  • small fix in docs build instructions

Let me know of any feedback, happy to iterate on this!

add multi gpu launch script

add auto setup hardware docs

remove an example

tiny fixes
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 22, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your PR. I'll echo the comment I made on the Transformers PR: while we're super happy to showcase RunHouse examples, I feel having one page in the doc is maybe a step too far at this stage. Could we focus first on:

  • having one example as you did
  • say a bit more clearly in the README that RunHouse is one way to run the examples remotely (and there might be others)
  • make launch_auto_hardware.mdx a notebook you host on your side and link to it from the examples/README.md

@carolineechen
Copy link
Contributor Author

Hi @sgugger, thanks for the feedback! I've updated this PR to reflect the comments, let me know if there's anything else

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@sgugger sgugger merged commit 1fe27e7 into huggingface:main Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants