Skip to content

Commit

Permalink
Add gpu doc for how to build PyTorch/XLA from source with GPU support. (
Browse files Browse the repository at this point in the history
#5384)

* Add gpu doc for how to build PyTorch/XLA from source with GPU support.

* fix typo

* fix comments

* fix comments
  • Loading branch information
vanbasten23 authored and will-cromar committed Sep 14, 2023
1 parent d2f8221 commit 3ff13bf
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 4 deletions.
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ We recommend you to use our prebuilt Docker image to start your development work
python setup.py install
```

### Build PyTorch/XLA from source with GPU support

Please refer to this [guide](https://github.com/pytorch/xla/blob/master/docs/gpu.md#develop-pytorchxla-on-a-gpu-instance-build-pytorchxla-from-source-with-gpu-support).

## Before Submitting A Pull Request:

In `pytorch/xla` repo we enforce coding style for both C++ and Python files. Please try to format
Expand Down
42 changes: 38 additions & 4 deletions docs/gpu.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# How to run with PyTorch/XLA:GPU

PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU This doc will go over the basic steps to run PyTorch/XLA on a nvidia gpu instance
PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU. This doc will go over the basic steps to run PyTorch/XLA on a nvidia GPU instances.

## Create a GPU instance
Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a GPU instance with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla/tree/jackcao/gpu_doc#-available-images-and-wheels).

## Environment Setup
You can either use a local machine with GPU attached or a GPU VM on the cloud. For example in Google Cloud you can follow this [doc](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus) to create the GPU VM.

To create a GPU VM in Google Compute Engine, follow the [Google Cloud documentation](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus).
## Environment Setup

### Docker
Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a docker container with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla#available-docker-images-and-wheels).
```
sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
Expand Down Expand Up @@ -74,3 +74,37 @@ Epoch 1 train begin 06:12:38
```
## AMP (AUTOMATIC MIXED PRECISION)
AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You can checkout our [mnist example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist_amp.py) and [imagenet example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_imagenet_amp.py). Note that we also used a modified version of [optimizers](https://github.com/pytorch/xla/tree/master/torch_xla/amp/syncfree) to avoid the additional sync between device and host.

## Develop PyTorch/XLA on a GPU instance (build PyTorch/XLA from source with GPU support)

1. Inside a GPU VM, create a docker container from a development docker image. For example:

```
sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
sudo docker run --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8
sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash
```

2. Build PyTorch and PyTorch/XLA from source.

```
git clone https://github.com/pytorch/pytorch.git
cd pytorch
USE_CUDA=0 python setup.py install
git clone https://github.com/pytorch/xla.git
cd xla
XLA_CUDA=1 python setup.py install
```

3. Verify if PyTorch and PyTorch/XLA have been installed successfully.

If you can run the test in the section
[Run a simple model](#run-a-simple-model) successfully, then PyTorch and
PyTorch/XLA should have been installed successfully.

0 comments on commit 3ff13bf

Please sign in to comment.