Skip to content

8-bit Lion, 8-bit Load/Store from HF Hub

Compare
Choose a tag to compare
@TimDettmers TimDettmers released this 12 Apr 15:13
· 506 commits to main since this release

8-bit Lion, Load/Store 8-bit Models directly from/to HF Hub

This release brings 8-bit Lion to bitsandbytes. Compared to standard 32-bit Adam, it is 8x more memory efficient.

Furthermore, now models can now be serialized in 8-bit and pushed to the HuggingFace Hub. This means you can also load them from the hub in 8-bit, making big models much easier to download and load into CPU memory.

To use this feature, you need the newest transformer release (this will likely be integrated into the HF transformer release tomorrow).

In this release, CUDA 10.2 and GTX 700/K10 GPUs are deprecated in order to allow for broad support of bfloat16 in release 0.39.0.

Features:

  • Support for 32 and 8-bit Lion has been added. Thank you @lucidrains
  • Support for serialization of Linear8bitLt layers (LLM.int8()). This allows to store and load 8-bit weights directly from the HuggingFace Hub. Thank you @mryab
  • New bug report features python -m bitsandbytes now gives extensive debugging details to debug CUDA setup failures.

Bug fixes:

  • Fixed a bug where some bitsandbytes methods failed in a model-parallel setup on multiple GPUs. Thank you @tonylins
  • Fixed a bug where cudart.so libraries could not be found in newer PyTorch releases.

Improvements:

  • Improved the CUDA Setup procedure by doing a more extensive search for CUDA libraries

Deprecated:

  • Devices with compute capability 3.0 (GTX 700s, K10) and 3.2 (Tegra K1, Jetson TK1) are now deprecated and support will be removed in 0.39.0.
  • Support for CUDA 10.0 and 10.2 will be removed in bitsandbytes 0.39.0