Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[modeling_utils] postpone bnb loading until and if it's needed #18859

Merged
merged 1 commit into from
Sep 2, 2022

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Sep 2, 2022

BNB shouldn't be loaded unless it's actually used - definitely not by used-everywhere modeling_utils.py:

The following shouldn't (1) generate all this noise and (2) use up memory and resources w/o an actual need:

$ python -c "from transformers import BloomModel"

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /home/stas/anaconda3/envs/py38-pt112/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /home/stas/anaconda3/envs/py38-pt112/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so...

Specifically, currently only using from_pretrained(..., load_in_8bit=True) should load it.

My proposal is probably not the best, but it solves this problem

Probably a cleaner solution is to rewrite src/transformers/utils/bitsandbytes.py to delay loading its libraries until and if it is used - not sure. Totally open to other suggestions.

@sgugger, @younesbelkada

@stas00 stas00 changed the title postpone bnb load until it's needed [modeling_utils] postpone bnb loading until and if it's needed Sep 2, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 2, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! Thanks a lot for fixing this!

@stas00 stas00 merged commit c5be7ca into main Sep 2, 2022
@stas00 stas00 deleted the postpone-bnb branch September 2, 2022 15:22
@stas00
Copy link
Contributor Author

stas00 commented Sep 2, 2022

actually the problem was much more severe - before this PR on a machine with no gpu, it lead to this huge crash:

python -c "from transformers import AutoModel, AutoTokenizer, AutoConfig; AutoModel.from_pretrained('gpt2'), AutoTokenizer.from_pretrained('gpt2'), AutoConfig.from_pretrained('gpt2');"

Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 550kB/s]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /gpfswork/rech/six/commun/conda/inference/lib/libcudart.so
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
Traceback (most recent call last):
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/import_utils.py", line 1031, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/gpt2/modeling_gpt2.py", line 49, in <module>
    from ...modeling_utils import PreTrainedModel, SequenceSummary
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/modeling_utils.py", line 88, in <module>
    from .utils.bitsandbytes import get_key_to_not_convert, replace_8bit_linear, set_module_8bit_tensor_to_device
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/bitsandbytes.py", line 10, in <module>
    import bitsandbytes as bnb
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 4, in <module>
    import bitsandbytes.functional as F
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/functional.py", line 14, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 15, in initialize
    binary_name = evaluate_cuda_setup()
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 136, in evaluate_cuda_setup
    cc = get_compute_capability(cuda)
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 109, in get_compute_capability
    ccs = get_compute_capabilities(cuda)
  File "/gpfswork/rech/six/commun/conda/inference/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 87, in get_compute_capabilities
    check_cuda_result(cuda, cuda.cuDeviceGetCount(ctypes.byref(nGpus)))
AttributeError: 'NoneType' object has no attribute 'cuDeviceGetCount'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 462, in from_pretrained
    model_class = _get_model_class(config, cls._model_mapping)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 359, in _get_model_class
    supported_models = model_mapping[type(config)]
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 583, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 597, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/models/auto/auto_factory.py", line 553, in getattribute_from_module
    if hasattr(module, attr):
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/import_utils.py", line 1021, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/gpfsssd/worksf/projects/rech/six/commun/code/inference/transformers/src/transformers/utils/import_utils.py", line 1033, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.gpt2.modeling_gpt2 because of the following error (look up to see its traceback):
'NoneType' object has no attribute 'cuDeviceGetCount'

basically rendering transformers completely broken if bnb was installed and the machine had no visible gpu.

after updating the clone post this PR merge all is back to normal.

@stas00
Copy link
Contributor Author

stas00 commented Sep 2, 2022

@younesbelkada, I think this functionality of load_in_8bit=True requires checking that there is at least one gpu and cleanly assert if there isn't any. i.e this feature can be used only with gpu_count > 0.

@younesbelkada
Copy link
Contributor

Hi @stas00 ,

Thanks a lot for adding this! I agree with all the points stated on the PR.
Agreed also on your final suggestion, I will add a small PR to cleanly check if a GPU has been correctly detected by Pytorch

oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants