[modeling_utils] postpone bnb loading until and if it's needed #18859

stas00 · 2022-09-02T00:01:20Z

BNB shouldn't be loaded unless it's actually used - definitely not by used-everywhere modeling_utils.py:

The following shouldn't (1) generate all this noise and (2) use up memory and resources w/o an actual need:

$ python -c "from transformers import BloomModel"

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /home/stas/anaconda3/envs/py38-pt112/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /home/stas/anaconda3/envs/py38-pt112/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so...

Specifically, currently only using from_pretrained(..., load_in_8bit=True) should load it.

My proposal is probably not the best, but it solves this problem

Probably a cleaner solution is to rewrite src/transformers/utils/bitsandbytes.py to delay loading its libraries until and if it is used - not sure. Totally open to other suggestions.

@sgugger, @younesbelkada

HuggingFaceDocBuilderDev · 2022-09-02T00:13:43Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Nice find! Thanks a lot for fixing this!

stas00 · 2022-09-02T22:07:58Z

actually the problem was much more severe - before this PR on a machine with no gpu, it lead to this huge crash:

python -c "from transformers import AutoModel, AutoTokenizer, AutoConfig; AutoModel.from_pretrained('gpt2'), AutoTokenizer.from_pretrained('gpt2'), AutoConfig.from_pretrained('gpt2');"

Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 550kB/s]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /gpfswork/rech/six/commun/conda/inference/lib/libcudart.so
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
Traceback (most recent call last):
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/import_utils.py", line 1031, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/gpt2/modeling_gpt2.py", line 49, in <module>
    from ...modeling_utils import PreTrainedModel, SequenceSummary
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/modeling_utils.py", line 88, in <module>
    from .utils.bitsandbytes import get_key_to_not_convert, replace_8bit_linear, set_module_8bit_tensor_to_device
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/bitsandbytes.py", line 10, in <module>
    import bitsandbytes as bnb
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 4, in <module>
    import bitsandbytes.functional as F
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/functional.py", line 14, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 15, in initialize
    binary_name = evaluate_cuda_setup()
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 136, in evaluate_cuda_setup
    cc = get_compute_capability(cuda)
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 109, in get_compute_capability
    ccs = get_compute_capabilities(cuda)
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 87, in get_compute_capabilities
    check_cuda_result(cuda, cuda.cuDeviceGetCount(ctypes.byref(nGpus)))
AttributeError: 'NoneType' object has no attribute 'cuDeviceGetCount'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 462, in from_pretrained
    model_class = _get_model_class(config, cls._model_mapping)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 359, in _get_model_class
    supported_models = model_mapping[type(config)]
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 583, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 597, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 553, in getattribute_from_module
    if hasattr(module, attr):
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/import_utils.py", line 1021, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/import_utils.py", line 1033, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.gpt2.modeling_gpt2 because of the following error (look up to see its traceback):
'NoneType' object has no attribute 'cuDeviceGetCount'

basically rendering transformers completely broken if bnb was installed and the machine had no visible gpu.

after updating the clone post this PR merge all is back to normal.

stas00 · 2022-09-02T22:11:42Z

@younesbelkada, I think this functionality of load_in_8bit=True requires checking that there is at least one gpu and cleanly assert if there isn't any. i.e this feature can be used only with gpu_count > 0.

younesbelkada · 2022-09-12T09:03:32Z

Hi @stas00 ,

Thanks a lot for adding this! I agree with all the points stated on the PR.
Agreed also on your final suggestion, I will add a small PR to cleanly check if a GPU has been correctly detected by Pytorch

postpone bnb load until it's needed

95def85

stas00 changed the title ~~postpone bnb load until it's needed~~ [modeling_utils] postpone bnb loading until and if it's needed Sep 2, 2022

sgugger approved these changes Sep 2, 2022

View reviewed changes

stas00 merged commit c5be7ca into main Sep 2, 2022

stas00 deleted the postpone-bnb branch September 2, 2022 15:22

younesbelkada mentioned this pull request Sep 12, 2022

New update breaks T5, gpt2, opt models (probably all models actually) if bitsandbytes is installed #18816

Closed

4 tasks

oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022

postpone bnb load until it's needed (huggingface#18859)

f1b12ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[modeling_utils] postpone bnb loading until and if it's needed #18859

[modeling_utils] postpone bnb loading until and if it's needed #18859

stas00 commented Sep 2, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 2, 2022 •

edited

Loading

sgugger left a comment

stas00 commented Sep 2, 2022 •

edited

Loading

stas00 commented Sep 2, 2022 •

edited

Loading

younesbelkada commented Sep 12, 2022

[modeling_utils] postpone bnb loading until and if it's needed #18859

[modeling_utils] postpone bnb loading until and if it's needed #18859

Conversation

stas00 commented Sep 2, 2022 • edited Loading

HuggingFaceDocBuilderDev commented Sep 2, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

stas00 commented Sep 2, 2022 • edited Loading

stas00 commented Sep 2, 2022 • edited Loading

younesbelkada commented Sep 12, 2022

stas00 commented Sep 2, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 2, 2022 •

edited

Loading

stas00 commented Sep 2, 2022 •

edited

Loading

stas00 commented Sep 2, 2022 •

edited

Loading