Add Auto Device Map option for BERT Models #26176

tanaymeh · 2023-09-15T04:08:37Z

What does this PR do?

This PR adds the 'device_map': "auto" functionality for BERT Models for ease in multi-GPU training.

Who can review?

younesbelkada

Thanks a lot for your great contribution !
Can you run

make fix-copies

to fix the failing checks.
Also can you confirm accelerate tests pass ?

pytest -m accelerate_tests tests/models/bert

Thanks!

tanaymeh · 2023-09-16T03:02:36Z

@younesbelkada, Sure! I am travelling to London this weekend and early next week so after that I will be able to push other changes and fix this.

Thanks for taking time and reviewing this, I will mark the PR to be "Ready" once I am done making changes. Cheers!

tanaymeh · 2023-10-02T21:41:40Z

Hi @younesbelkada, I am facing a rather peculiar issue. While testing my _no_split_modules changes to the BERT code (bert-large-cased model), I encountered an error that only seems to arise when I have the _no_split_modules code present (and not when it is commented out).

Below is the error:

  File "/root/transformers/src/test.py", line 5, in <module>
    model = AutoModelForSequenceClassification.from_pretrained('bert-large-uncased', device_map='auto')
  File "/root/transformers/src/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/root/transformers/src/transformers/modeling_utils.py", line 2813, in from_pretrained
    resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)
  File "/root/transformers/src/transformers/utils/hub.py", line 429, in cached_file
    resolved_file = hf_hub_download(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1431, in hf_hub_download
    http_get(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 557, in http_get
    raise EnvironmentError(
OSError: Consistency check failed: file should be of size 1344951957 but has size 36775558 (model.safetensors).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

What's peculiar is that, as soon as I comment out the _no_split_modules = ["BertEmbeddings", "BertSelfAttention"] code, the error goes away and the model downloads all fine.

What I don't understand is the nature of this error, since it is caused when trying to download the model (and not loading it, which would've been a plausible place for the error to occur).

Also, If I download the model with _no_split_modules commented out and then load the model with it uncommented, the code runs perfectly fine.

Below is the test code script that I am running to test.

import torch
import random
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('bert-large-uncased', device_map='auto')
print(model(torch.tensor([[random.randint(0, 300) for x in range(512)]])))

younesbelkada · 2023-10-03T14:45:52Z

Hi @tanaymeh
Thanks for getting back to me, I ran

import torch
import random
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('bert-large-uncased', device_map='auto')
print(model(torch.tensor([[random.randint(0, 300) for x in range(512)]])))

with the changes proposed in the PR and the script worked fine on my end - not sure what is happening

I have also tried to run the accelerate tests and they seem to fail :/ Let me know if you need any help!

tanaymeh · 2023-10-09T21:11:34Z

Hi @younesbelkada, I checked line by line and Bert and RoBERTa have almost the same exact implementations.
Yet, when I use the same _no_split_modules = ["BertEmbeddings", "BertSelfAttention"] (as used in RoBERTa), it throws multiple accelerate errors.

I tried debugging but to no avail, do you suspect any potential causes?

younesbelkada · 2023-10-11T09:16:47Z

Hmmm I see, what are the errors you get? Can you share the full traceback ?

tanaymeh · 2023-10-11T09:29:26Z

@younesbelkada Here's the entire error log:

============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-7.4.2, pluggy-1.0.0
rootdir: /root/new/transformers
configfile: setup.cfg
plugins: hypothesis-6.87.2, anyio-4.0.0
collected 364 items / 361 deselected / 3 selected

tests/models/bert/test_modeling_bert.py FFF                              [100%]

=================================== FAILURES ===================================
________________________ BertModelTest.test_cpu_offload ________________________

self = <tests.models.bert.test_modeling_bert.BertModelTest testMethod=test_cpu_offload>

    @require_accelerate
    @mark.accelerate_tests
    @require_torch_gpu
    def test_cpu_offload(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        for model_class in self.all_model_classes:
            if model_class._no_split_modules is None:
                continue
    
            inputs_dict_class = self._prepare_for_class(inputs_dict, model_class)
            model = model_class(config).eval()
            model = model.to(torch_device)
    
            torch.manual_seed(0)
            base_output = model(**inputs_dict_class)
    
            model_size = compute_module_sizes(model)[""]
            # We test several splits of sizes to make sure it works.
            max_gpu_sizes = [int(p * model_size) for p in self.model_split_percents[1:]]
            with tempfile.TemporaryDirectory() as tmp_dir:
                model.cpu().save_pretrained(tmp_dir)
    
                for max_size in max_gpu_sizes:
                    max_memory = {0: max_size, "cpu": model_size * 2}
                    new_model = model_class.from_pretrained(tmp_dir, device_map="auto", max_memory=max_memory)
                    # Making sure part of the model will actually end up offloaded
                    self.assertSetEqual(set(new_model.hf_device_map.values()), {0, "cpu"})
    
>                   self.check_device_map_is_respected(new_model, new_model.hf_device_map)

tests/test_modeling_common.py:2600: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_modeling_common.py:2529: in check_device_map_is_respected
    self.assertEqual(param.device, torch.device("meta"))
E   AssertionError: device(type='cpu') != device(type='meta')
----------------------------- Captured stderr call -----------------------------
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
_______________________ BertModelTest.test_disk_offload ________________________

self = <tests.models.bert.test_modeling_bert.BertModelTest testMethod=test_disk_offload>

    @require_accelerate
    @mark.accelerate_tests
    @require_torch_gpu
    def test_disk_offload(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        for model_class in self.all_model_classes:
            if model_class._no_split_modules is None:
                continue
    
            inputs_dict_class = self._prepare_for_class(inputs_dict, model_class)
            model = model_class(config).eval()
            model = model.to(torch_device)
            torch.manual_seed(0)
            base_output = model(**inputs_dict_class)
    
            model_size = compute_module_sizes(model)[""]
            with tempfile.TemporaryDirectory() as tmp_dir:
                model.cpu().save_pretrained(tmp_dir)
    
                with self.assertRaises(ValueError):
                    max_size = int(self.model_split_percents[0] * model_size)
                    max_memory = {0: max_size, "cpu": max_size}
                    # This errors out cause it's missing an offload folder
                    new_model = model_class.from_pretrained(tmp_dir, device_map="auto", max_memory=max_memory)
    
                max_size = int(self.model_split_percents[1] * model_size)
                max_memory = {0: max_size, "cpu": max_size}
                new_model = model_class.from_pretrained(
                    tmp_dir, device_map="auto", max_memory=max_memory, offload_folder=tmp_dir
                )
    
>               self.check_device_map_is_respected(new_model, new_model.hf_device_map)

tests/test_modeling_common.py:2565: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_modeling_common.py:2529: in check_device_map_is_respected
    self.assertEqual(param.device, torch.device("meta"))
E   AssertionError: device(type='cpu') != device(type='meta')
----------------------------- Captured stderr call -----------------------------
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
_____________________ BertModelTest.test_model_parallelism _____________________

self = <tests.models.bert.test_modeling_bert.BertModelTest testMethod=test_model_parallelism>

    @require_accelerate
    @mark.accelerate_tests
    @require_torch_multi_gpu
    def test_model_parallelism(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        for model_class in self.all_model_classes:
            if model_class._no_split_modules is None:
                continue
    
            inputs_dict_class = self._prepare_for_class(inputs_dict, model_class)
            model = model_class(config).eval()
            model = model.to(torch_device)
    
            torch.manual_seed(0)
            base_output = model(**inputs_dict_class)
    
            model_size = compute_module_sizes(model)[""]
            # We test several splits of sizes to make sure it works.
            max_gpu_sizes = [int(p * model_size) for p in self.model_split_percents[1:]]
            with tempfile.TemporaryDirectory() as tmp_dir:
                model.cpu().save_pretrained(tmp_dir)
    
                for max_size in max_gpu_sizes:
                    max_memory = {0: max_size, 1: model_size * 2, "cpu": model_size * 2}
                    new_model = model_class.from_pretrained(tmp_dir, device_map="auto", max_memory=max_memory)
                    # Making sure part of the model will actually end up offloaded
>                   self.assertSetEqual(set(new_model.hf_device_map.values()), {0, 1})
E                   AssertionError: Items in the second set but not the first:
E                   0

tests/test_modeling_common.py:2634: AssertionError
----------------------------- Captured stderr call -----------------------------
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
=============================== warnings summary ===============================
../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1373
  /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: doctest_glob
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

tests/test_modeling_common.py:2746
  /root/new/transformers/tests/test_modeling_common.py:2746: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2773
  /root/new/transformers/tests/test_modeling_common.py:2773: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2815
  /root/new/transformers/tests/test_modeling_common.py:2815: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2857
  /root/new/transformers/tests/test_modeling_common.py:2857: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2894
  /root/new/transformers/tests/test_modeling_common.py:2894: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2931
  /root/new/transformers/tests/test_modeling_common.py:2931: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

../../../opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:28
  /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:28: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import packaging  # type: ignore[attr-defined]

../../../opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py:2871
../../../opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py:2871
  /opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/models/bert/test_modeling_bert.py::BertModelTest::test_cpu_offload
FAILED tests/models/bert/test_modeling_bert.py::BertModelTest::test_disk_offload
FAILED tests/models/bert/test_modeling_bert.py::BertModelTest::test_model_parallelism
================ 3 failed, 361 deselected, 10 warnings in 5.16s ================

tanaymeh · 2023-10-18T10:46:03Z

Hi @younesbelkada, have you found any updates on the issue?

github-actions · 2023-11-12T08:05:33Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

tanaymeh · 2023-11-12T12:02:21Z

Hi @amyeroberts!
Perhaps you may have some update or possible direction to proceed with this?

Thanks!

amyeroberts · 2023-11-15T19:05:53Z

@tanaymeh The failures of these tests indicate that the model weights aren't being distributed across devices as expected e.g. for tests/models/bert/test_modeling_bert.py::BertModelTest::test_model_parallelism it's expected that the model will be across two devices. To resolve it's a case of dropping into the test to inspect where there differences are and modifying _no_split_modules to see how to get the tests to pass e.g. for test_cpu_offload - which param is raising the assertion?

github-actions · 2023-12-10T08:06:06Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bp020108 · 2024-02-20T01:47:16Z

seeing error. Please help here

ValueError: BertLMHeadModel does not support device_map='auto'. To implement support, the model class needs to implement the _no_split_modules attribute.

amyeroberts · 2024-02-20T12:26:55Z

Hi @bp020108, you're seeing this error as device_map="auto" isn't supported for bert yet. This PR was closed and not merged in. If you'd like to add this support for BERT, you or anyone else in the community is welcome to open a PR to add this.

add: only Self attention and embeddings to _no_split_modules

e3f8e9b

younesbelkada reviewed Sep 15, 2023

View reviewed changes

github-actions bot closed this Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Auto Device Map option for BERT Models #26176

Add Auto Device Map option for BERT Models #26176

tanaymeh commented Sep 15, 2023

younesbelkada left a comment

tanaymeh commented Sep 16, 2023

tanaymeh commented Oct 2, 2023

younesbelkada commented Oct 3, 2023

tanaymeh commented Oct 9, 2023 •

edited

Loading

younesbelkada commented Oct 11, 2023

tanaymeh commented Oct 11, 2023

tanaymeh commented Oct 18, 2023

github-actions bot commented Nov 12, 2023

tanaymeh commented Nov 12, 2023

amyeroberts commented Nov 15, 2023

github-actions bot commented Dec 10, 2023

bp020108 commented Feb 20, 2024

amyeroberts commented Feb 20, 2024

Add Auto Device Map option for BERT Models #26176

Add Auto Device Map option for BERT Models #26176

Conversation

tanaymeh commented Sep 15, 2023

What does this PR do?

Who can review?

younesbelkada left a comment

Choose a reason for hiding this comment

tanaymeh commented Sep 16, 2023

tanaymeh commented Oct 2, 2023

younesbelkada commented Oct 3, 2023

tanaymeh commented Oct 9, 2023 • edited Loading

younesbelkada commented Oct 11, 2023

tanaymeh commented Oct 11, 2023

tanaymeh commented Oct 18, 2023

github-actions bot commented Nov 12, 2023

tanaymeh commented Nov 12, 2023

amyeroberts commented Nov 15, 2023

github-actions bot commented Dec 10, 2023

bp020108 commented Feb 20, 2024

amyeroberts commented Feb 20, 2024

tanaymeh commented Oct 9, 2023 •

edited

Loading