Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Auto Device Map option for BERT Models #26176

Closed
wants to merge 1 commit into from

Conversation

tanaymeh
Copy link
Contributor

What does this PR do?

This PR adds the 'device_map': "auto" functionality for BERT Models for ease in multi-GPU training.

Fixes #25296

Who can review?

@younesbelkada

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your great contribution !
Can you run

make fix-copies

to fix the failing checks.
Also can you confirm accelerate tests pass ?

pytest -m accelerate_tests tests/models/bert

Thanks!

@tanaymeh
Copy link
Contributor Author

@younesbelkada, Sure! I am travelling to London this weekend and early next week so after that I will be able to push other changes and fix this.

Thanks for taking time and reviewing this, I will mark the PR to be "Ready" once I am done making changes. Cheers!

@tanaymeh
Copy link
Contributor Author

tanaymeh commented Oct 2, 2023

Hi @younesbelkada, I am facing a rather peculiar issue. While testing my _no_split_modules changes to the BERT code (bert-large-cased model), I encountered an error that only seems to arise when I have the _no_split_modules code present (and not when it is commented out).

Below is the error:

  File "/root/transformers/src/test.py", line 5, in <module>
    model = AutoModelForSequenceClassification.from_pretrained('bert-large-uncased', device_map='auto')
  File "/root/transformers/src/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/root/transformers/src/transformers/modeling_utils.py", line 2813, in from_pretrained
    resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)
  File "/root/transformers/src/transformers/utils/hub.py", line 429, in cached_file
    resolved_file = hf_hub_download(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1431, in hf_hub_download
    http_get(
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 557, in http_get
    raise EnvironmentError(
OSError: Consistency check failed: file should be of size 1344951957 but has size 36775558 (model.safetensors).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

What's peculiar is that, as soon as I comment out the _no_split_modules = ["BertEmbeddings", "BertSelfAttention"] code, the error goes away and the model downloads all fine.

What I don't understand is the nature of this error, since it is caused when trying to download the model (and not loading it, which would've been a plausible place for the error to occur).

Also, If I download the model with _no_split_modules commented out and then load the model with it uncommented, the code runs perfectly fine.

Below is the test code script that I am running to test.

import torch
import random
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('bert-large-uncased', device_map='auto')
print(model(torch.tensor([[random.randint(0, 300) for x in range(512)]])))

@younesbelkada
Copy link
Contributor

Hi @tanaymeh
Thanks for getting back to me, I ran

import torch
import random
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('bert-large-uncased', device_map='auto')
print(model(torch.tensor([[random.randint(0, 300) for x in range(512)]])))

with the changes proposed in the PR and the script worked fine on my end - not sure what is happening

I have also tried to run the accelerate tests and they seem to fail :/ Let me know if you need any help!

@tanaymeh
Copy link
Contributor Author

tanaymeh commented Oct 9, 2023

Hi @younesbelkada, I checked line by line and Bert and RoBERTa have almost the same exact implementations.
Yet, when I use the same _no_split_modules = ["BertEmbeddings", "BertSelfAttention"] (as used in RoBERTa), it throws multiple accelerate errors.

I tried debugging but to no avail, do you suspect any potential causes?

@younesbelkada
Copy link
Contributor

Hmmm I see, what are the errors you get? Can you share the full traceback ?

@tanaymeh
Copy link
Contributor Author

@younesbelkada Here's the entire error log:

============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-7.4.2, pluggy-1.0.0
rootdir: /root/new/transformers
configfile: setup.cfg
plugins: hypothesis-6.87.2, anyio-4.0.0
collected 364 items / 361 deselected / 3 selected

tests/models/bert/test_modeling_bert.py FFF                              [100%]

=================================== FAILURES ===================================
________________________ BertModelTest.test_cpu_offload ________________________

self = <tests.models.bert.test_modeling_bert.BertModelTest testMethod=test_cpu_offload>

    @require_accelerate
    @mark.accelerate_tests
    @require_torch_gpu
    def test_cpu_offload(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        for model_class in self.all_model_classes:
            if model_class._no_split_modules is None:
                continue
    
            inputs_dict_class = self._prepare_for_class(inputs_dict, model_class)
            model = model_class(config).eval()
            model = model.to(torch_device)
    
            torch.manual_seed(0)
            base_output = model(**inputs_dict_class)
    
            model_size = compute_module_sizes(model)[""]
            # We test several splits of sizes to make sure it works.
            max_gpu_sizes = [int(p * model_size) for p in self.model_split_percents[1:]]
            with tempfile.TemporaryDirectory() as tmp_dir:
                model.cpu().save_pretrained(tmp_dir)
    
                for max_size in max_gpu_sizes:
                    max_memory = {0: max_size, "cpu": model_size * 2}
                    new_model = model_class.from_pretrained(tmp_dir, device_map="auto", max_memory=max_memory)
                    # Making sure part of the model will actually end up offloaded
                    self.assertSetEqual(set(new_model.hf_device_map.values()), {0, "cpu"})
    
>                   self.check_device_map_is_respected(new_model, new_model.hf_device_map)

tests/test_modeling_common.py:2600: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_modeling_common.py:2529: in check_device_map_is_respected
    self.assertEqual(param.device, torch.device("meta"))
E   AssertionError: device(type='cpu') != device(type='meta')
----------------------------- Captured stderr call -----------------------------
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
_______________________ BertModelTest.test_disk_offload ________________________

self = <tests.models.bert.test_modeling_bert.BertModelTest testMethod=test_disk_offload>

    @require_accelerate
    @mark.accelerate_tests
    @require_torch_gpu
    def test_disk_offload(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        for model_class in self.all_model_classes:
            if model_class._no_split_modules is None:
                continue
    
            inputs_dict_class = self._prepare_for_class(inputs_dict, model_class)
            model = model_class(config).eval()
            model = model.to(torch_device)
            torch.manual_seed(0)
            base_output = model(**inputs_dict_class)
    
            model_size = compute_module_sizes(model)[""]
            with tempfile.TemporaryDirectory() as tmp_dir:
                model.cpu().save_pretrained(tmp_dir)
    
                with self.assertRaises(ValueError):
                    max_size = int(self.model_split_percents[0] * model_size)
                    max_memory = {0: max_size, "cpu": max_size}
                    # This errors out cause it's missing an offload folder
                    new_model = model_class.from_pretrained(tmp_dir, device_map="auto", max_memory=max_memory)
    
                max_size = int(self.model_split_percents[1] * model_size)
                max_memory = {0: max_size, "cpu": max_size}
                new_model = model_class.from_pretrained(
                    tmp_dir, device_map="auto", max_memory=max_memory, offload_folder=tmp_dir
                )
    
>               self.check_device_map_is_respected(new_model, new_model.hf_device_map)

tests/test_modeling_common.py:2565: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_modeling_common.py:2529: in check_device_map_is_respected
    self.assertEqual(param.device, torch.device("meta"))
E   AssertionError: device(type='cpu') != device(type='meta')
----------------------------- Captured stderr call -----------------------------
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
_____________________ BertModelTest.test_model_parallelism _____________________

self = <tests.models.bert.test_modeling_bert.BertModelTest testMethod=test_model_parallelism>

    @require_accelerate
    @mark.accelerate_tests
    @require_torch_multi_gpu
    def test_model_parallelism(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
    
        for model_class in self.all_model_classes:
            if model_class._no_split_modules is None:
                continue
    
            inputs_dict_class = self._prepare_for_class(inputs_dict, model_class)
            model = model_class(config).eval()
            model = model.to(torch_device)
    
            torch.manual_seed(0)
            base_output = model(**inputs_dict_class)
    
            model_size = compute_module_sizes(model)[""]
            # We test several splits of sizes to make sure it works.
            max_gpu_sizes = [int(p * model_size) for p in self.model_split_percents[1:]]
            with tempfile.TemporaryDirectory() as tmp_dir:
                model.cpu().save_pretrained(tmp_dir)
    
                for max_size in max_gpu_sizes:
                    max_memory = {0: max_size, 1: model_size * 2, "cpu": model_size * 2}
                    new_model = model_class.from_pretrained(tmp_dir, device_map="auto", max_memory=max_memory)
                    # Making sure part of the model will actually end up offloaded
>                   self.assertSetEqual(set(new_model.hf_device_map.values()), {0, 1})
E                   AssertionError: Items in the second set but not the first:
E                   0

tests/test_modeling_common.py:2634: AssertionError
----------------------------- Captured stderr call -----------------------------
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
=============================== warnings summary ===============================
../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1373
  /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: doctest_glob
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

tests/test_modeling_common.py:2746
  /root/new/transformers/tests/test_modeling_common.py:2746: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2773
  /root/new/transformers/tests/test_modeling_common.py:2773: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2815
  /root/new/transformers/tests/test_modeling_common.py:2815: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2857
  /root/new/transformers/tests/test_modeling_common.py:2857: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2894
  /root/new/transformers/tests/test_modeling_common.py:2894: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

tests/test_modeling_common.py:2931
  /root/new/transformers/tests/test_modeling_common.py:2931: PytestUnknownMarkWarning: Unknown pytest.mark.flash_attn_test - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @mark.flash_attn_test

../../../opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:28
  /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:28: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import packaging  # type: ignore[attr-defined]

../../../opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py:2871
../../../opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py:2871
  /opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/models/bert/test_modeling_bert.py::BertModelTest::test_cpu_offload
FAILED tests/models/bert/test_modeling_bert.py::BertModelTest::test_disk_offload
FAILED tests/models/bert/test_modeling_bert.py::BertModelTest::test_model_parallelism
================ 3 failed, 361 deselected, 10 warnings in 5.16s ================

@tanaymeh
Copy link
Contributor Author

Hi @younesbelkada, have you found any updates on the issue?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@tanaymeh
Copy link
Contributor Author

Hi @amyeroberts!
Perhaps you may have some update or possible direction to proceed with this?

Thanks!

@amyeroberts
Copy link
Collaborator

@tanaymeh The failures of these tests indicate that the model weights aren't being distributed across devices as expected e.g. for tests/models/bert/test_modeling_bert.py::BertModelTest::test_model_parallelism it's expected that the model will be across two devices. To resolve it's a case of dropping into the test to inspect where there differences are and modifying _no_split_modules to see how to get the tests to pass e.g. for test_cpu_offload - which param is raising the assertion?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Dec 18, 2023
@bp020108
Copy link

seeing error. Please help here

ValueError: BertLMHeadModel does not support device_map='auto'. To implement support, the model class needs to implement the _no_split_modules attribute.

@amyeroberts
Copy link
Collaborator

Hi @bp020108, you're seeing this error as device_map="auto" isn't supported for bert yet. This PR was closed and not merged in. If you'd like to add this support for BERT, you or anyone else in the community is welcome to open a PR to add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BertForSequenceClassification does not support 'device_map':"auto" yet
4 participants