-
Notifications
You must be signed in to change notification settings - Fork 525
Running on AMD GPU
Due to the great work of Odonata (Discord, github @edt-xx) and the hardware of oceanmasterza (Discord) and the help of epicx (Discord, GitHub @bennmann), we have the below AMD instructions.
On host machine (according the the author of the bitandbyte port (github @arlo-phoenix), rocm/pytorch should also work):
docker pull rocm/pytorch-nightly
sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/pytorch-nightly
In the running image:
cd /home
export HSA_OVERRIDE_GFX_VERSION=10.3.0
# Install bitsandbytes with ROCM support
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6.git bitsandbytes
cd bitsandbytes
make hip ROCM_TARGET=gfx1030
pip install pip --upgrade
pip install .
# Install Petals
cd ..
pip install --upgrade git+https://github.com/bigscience-workshop/petals
# Run server
python -m petals.cli.run_server stabilityai/StableBeluga2 --port <an open port>
Multi-GPU process (--tensor_parallel_devices
) is still not tested (docker --gpu
flag may not function at this time and other virtualization tools may be necessary).
Contributed by: @edt-xx, @bennmann
Tested on:
- AMD 6600 XT tested July 24th, 2023 on Arch Linux with Rocm 5.6.0, mesa 22.1.4
- AMD 6900 XT tested April 18th, 2023 on bare metal Ubuntu 22.04 (no docker/anaconda/container). Tested with ROCM 5.4.2
- Untested on 7000 series, however 7000s may have much better performance as AMD added machine learning tensor library and better hardware support (vs ray tracing only on 6000 series)
Guide:
-
use the mesa-clover and mesa-rusticl opencl variants
-
add
export HSA_OVERRIDE_GFX_VERSION=10.3.0
to your environment (put it to/home/user/.bashrc
on ubuntu - this tricks ROCM to work on more consumer based cards like the 6000 series) -
install ROCM. Use this tutorial for Arch Linux: https://wiki.archlinux.org/title/GPGPU
-
create and activate a venv for petals using python 3.11
- python -m venv <yourvenvpath>
- cd <yourvenvpath>
- source bin/activate
-
in the venv install pytorch, nightly version, with the command generated on by the website: https://pytorch.org/get-started/locally/
-
install the Petals version with AMD GPU support:
pip install git+https://github.com/bigscience-workshop/petals@amd-gpus
This branch uses an older version of
bitsandbytes
patched to have AMD GPU support (developed by @brontoc and Titaniumtown). This means that you won't be able to use the 4-bit qunatization (--quant_type nf4
) and LoRA adapters (the--adapters
argument). The server will use 8-bit quantization (int8) for all models by default.Tip: You can set your fans to full speed or close to it before starting Petals (the default Linux fan profile for AMD GPUs is not good on some cards):
rocm-smi --setfan 99%
-
run petals using:
python -m petals.cli.run_server stabilityai/StableBeluga2
Tip: You can monitor temperature and woltage by running this:
rocm-smi && rocm-smi -t