Adding support for local SGD. #1378

searchivarius · 2023-05-01T23:42:58Z

Hi,

As discussed with @muellerzr , I am adding a non-breaking functionality to support local SGD, which enables efficient multi-GPU training in the case when no fast interconnect is possible.

This replaces a previous pull request

I incorporated recommendation to change the per device batch size (in the separate benchmark) and I also removed the erroneous code (borrowed from the examples) that reduced the size of the dataset (as well as code-related recommendations) and retested everything. There is still some unexplained behavior, but it looks quite reasonable now:

Here is a link to the summary of results and benchmarking code.

Importantly @sgugger @muellerzr I think it is possibly every no-trainer example has this line that reduces the dataset size, but I think it's a bug. Please double check!

It is not clear what kind of unit test can test this kind of functionality. The key ingredient (disabling gradient synchronization is being tested already).

HuggingFaceDocBuilderDev · 2023-05-01T23:49:26Z

The documentation is not available anymore as the PR was closed or merged.

muellerzr

Thanks for the PR! I've made a few suggestions to simply the code, and one request re; attribution. The suggestions can be done here or I can do it in a follow up PR, whichever is easier. Thanks again!

muellerzr · 2023-05-02T11:37:34Z

tests/test_examples.py

@@ -34,6 +34,7 @@
 EXCLUDE_EXAMPLES = [
    "cross_validation.py",
    "gradient_accumulation.py",
+    "local_sgd.py",


While this is accurate, to test your script we should also include a test similar to

accelerate/tests/test_examples.py

Lines 215 to 217 in daf76a7

def test_gradient_accumulation(self):

testargs = ["examples/by_feature/gradient_accumulation.py"]

run_command(self._launch_args + testargs)

, to ensure your script runs fine. That would be all the tests that it would need I believe :)

Regarding this one, I am a bit unsure how this is invoked. I add a file examples/by_feature/local_sgd.py, I can add a similar function test_local_sgd(self). A couple of Qs though:

will it be called automatically?

which testargs will be used?

It gets called when the example tests are through our CI. testargs are whatever need to be used to make your script run with the feature enabled. For instance this reads as:

accelerate launch examples/by_feature/gradient_accumulation.py

(accelerate launch comes from self._launch_args`)

Yes, add a similar test_local_sgd() file which does this, as we make sure all of our example tests get run. to see when they are fully evoked, see the Makefile under test_examples

Hi @muellerzr running the tests (make test) and see the following (note the ignore part):

python -m pytest -s -v ./tests/ --ignore=./tests/test_examples.py

so tests need to be run explicitly? is this what you do yourself? asking b/c make test ignores examples.

You need test_examples, which was further down from where you looked at... https://github.com/huggingface/accelerate/blob/main/Makefile#L45-L46

Yes, the core tests ignore the examples as they often require GPUs or take a long time..

src/accelerate/local_sgd.py

Co-authored-by: Zachary Mueller <muellerzr@gmail.com>

muellerzr · 2023-05-02T14:51:43Z

@searchivarius any particular reason for closing/reopening later rather than modifying on the PR branch and having a trail to follow back through on the PR comments?

searchivarius · 2023-05-02T15:51:29Z

@muellerzr I just wanted to squash commits. If this is not an issue, I can reopen the PR & make additional commits, just let me know.

muellerzr · 2023-05-02T15:57:25Z

Not an issue at all, we squash all commits upon merge anyways

searchivarius · 2023-05-02T16:35:28Z

PS: @muellerzr not quite ready yet, I am still making changes and testing.

HuggingFaceDocBuilderDev · 2023-05-02T16:52:05Z

The documentation is not available anymore as the PR was closed or merged.

searchivarius · 2023-05-02T20:39:25Z

@muellerzr it should be ready, I am not sure why the quality check fails, make quality or 'make style` do not suggest any changes when run locally. Perhaps, there's some version mismatch.

muellerzr · 2023-05-02T20:41:14Z

@searchivarius you should try doing pip install -e .[quality], as we pin what versions we use which could be part of it? :)

searchivarius · 2023-05-02T21:07:34Z

@muellerzr it doesn't make a difference.

muellerzr · 2023-05-05T16:08:27Z

@searchivarius perhaps uninstall then reinstall everything in the docs dep? Otherwise I can fork + put a PR in to fix it. Aka:

pip uninstall black flake8 hf-doc-builder -y

pip install -e .[quality]

Note that you'll also need urllib3 to be < 2.0.0

searchivarius · 2023-05-05T22:49:38Z

Hi @muellerzr this didn't help: moreover, make style and make quality do not reformat the code. I suspect it's something about the version of black and ruff that is being installed. Do you know which versions do you use?

BTW, regarding urllib3:

>>> urllib3.__version__
'1.26.14'

muellerzr · 2023-05-08T11:08:04Z

Perfect, that's the right version of urllib3.

For the other deps, try installing these versions:

pip install black==23.1 ruff==0.0.241 hf-doc-builder==0.4.0

searchivarius · 2023-05-08T18:04:57Z

@muellerzr thank you, now the file is reformatted properly. There is a chance that versions were not an issue, but I was rather working on a wrong branch. There's no way to know this for sure though (b/c I deleted and re-cloned the local repo). That said, it seems to be fine now.

muellerzr

LG2M! Thanks for all the back and forth, and hard work on this! 🤗

cc @sgugger for final review

sgugger

Just have a couple of nits in the Copyright years, but LGTM otherwise! Thanks!

docs/source/usage_guides/local_sgd.mdx

examples/by_feature/local_sgd.py

src/accelerate/local_sgd.py

searchivarius · 2023-05-09T13:28:52Z

@muellerzr @sgugger NP and thank you for the guidance. Do not forget to remove the division by the number of steps in examples like this one:
https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa_no_trainer.py#L758

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Adding support for local SGD.

daf76a7

muellerzr reviewed May 2, 2023

View reviewed changes

searchivarius and others added 3 commits May 2, 2023 10:49

Update src/accelerate/local_sgd.py

5e08c7d

Co-authored-by: Zachary Mueller <muellerzr@gmail.com>

Update src/accelerate/local_sgd.py

38d503f

Co-authored-by: Zachary Mueller <muellerzr@gmail.com>

Update src/accelerate/local_sgd.py

c6e3056

Co-authored-by: Zachary Mueller <muellerzr@gmail.com>

searchivarius closed this May 2, 2023

searchivarius reopened this May 2, 2023

fixing reduction + adding a test.

58a2948

style fix.

04e6abd

muellerzr approved these changes May 9, 2023

View reviewed changes

sgugger approved these changes May 9, 2023

View reviewed changes

docs/source/usage_guides/local_sgd.mdx Outdated Show resolved Hide resolved

examples/by_feature/local_sgd.py Outdated Show resolved Hide resolved

src/accelerate/local_sgd.py Outdated Show resolved Hide resolved

searchivarius and others added 3 commits May 9, 2023 09:35

Update docs/source/usage_guides/local_sgd.mdx

05d7b5d

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/accelerate/local_sgd.py

69d549d

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update examples/by_feature/local_sgd.py

517db75

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger merged commit da39665 into huggingface:main May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for local SGD. #1378

Adding support for local SGD. #1378

searchivarius commented May 1, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 1, 2023 •

edited

Loading

muellerzr left a comment

muellerzr May 2, 2023

searchivarius May 2, 2023

muellerzr May 2, 2023

searchivarius May 2, 2023

muellerzr May 2, 2023

muellerzr commented May 2, 2023

searchivarius commented May 2, 2023

muellerzr commented May 2, 2023

searchivarius commented May 2, 2023

HuggingFaceDocBuilderDev commented May 2, 2023 •

edited

Loading

searchivarius commented May 2, 2023

muellerzr commented May 2, 2023

searchivarius commented May 2, 2023

muellerzr commented May 5, 2023

searchivarius commented May 5, 2023

muellerzr commented May 8, 2023

searchivarius commented May 8, 2023 •

edited

Loading

muellerzr left a comment

sgugger left a comment

searchivarius commented May 9, 2023

	def test_gradient_accumulation(self):
	testargs = ["examples/by_feature/gradient_accumulation.py"]
	run_command(self._launch_args + testargs)

Adding support for local SGD. #1378

Adding support for local SGD. #1378

Conversation

searchivarius commented May 1, 2023 • edited Loading

HuggingFaceDocBuilderDev commented May 1, 2023 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr May 2, 2023

Choose a reason for hiding this comment

searchivarius May 2, 2023

Choose a reason for hiding this comment

muellerzr May 2, 2023

Choose a reason for hiding this comment

searchivarius May 2, 2023

Choose a reason for hiding this comment

muellerzr May 2, 2023

Choose a reason for hiding this comment

muellerzr commented May 2, 2023

searchivarius commented May 2, 2023

muellerzr commented May 2, 2023

searchivarius commented May 2, 2023

HuggingFaceDocBuilderDev commented May 2, 2023 • edited Loading

searchivarius commented May 2, 2023

muellerzr commented May 2, 2023

searchivarius commented May 2, 2023

muellerzr commented May 5, 2023

searchivarius commented May 5, 2023

muellerzr commented May 8, 2023

searchivarius commented May 8, 2023 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

searchivarius commented May 9, 2023

searchivarius commented May 1, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 1, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 2, 2023 •

edited

Loading

searchivarius commented May 8, 2023 •

edited

Loading