Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gpu doc for how to build PyTorch/XLA from source with GPU support. #5384

Merged
merged 4 commits into from
Aug 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ We recommend you to use our prebuilt Docker image to start your development work
python setup.py install
```

### Build PyTorch/XLA from source with GPU support

Please refer to this [guide](https://github.com/pytorch/xla/blob/master/docs/gpu.md#develop-pytorchxla-on-a-gpu-instance-build-pytorchxla-from-source-with-gpu-support).

## Before Submitting A Pull Request:

In `pytorch/xla` repo we enforce coding style for both C++ and Python files. Please try to format
Expand Down
42 changes: 38 additions & 4 deletions docs/gpu.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# How to run with PyTorch/XLA:GPU

PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU This doc will go over the basic steps to run PyTorch/XLA on a nvidia gpu instance
PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU. This doc will go over the basic steps to run PyTorch/XLA on a nvidia GPU instances.

## Create a GPU instance
Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a GPU instance with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla/tree/jackcao/gpu_doc#-available-images-and-wheels).

## Environment Setup
You can either use a local machine with GPU attached or a GPU VM on the cloud. For example in Google Cloud you can follow this [doc](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus) to create the GPU VM.

To create a GPU VM in Google Compute Engine, follow the [Google Cloud documentation](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus).
## Environment Setup

### Docker
Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a docker container with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla#available-docker-images-and-wheels).
```
sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
Expand Down Expand Up @@ -74,3 +74,37 @@ Epoch 1 train begin 06:12:38
```
## AMP (AUTOMATIC MIXED PRECISION)
AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You can checkout our [mnist example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist_amp.py) and [imagenet example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_imagenet_amp.py). Note that we also used a modified version of [optimizers](https://github.com/pytorch/xla/tree/master/torch_xla/amp/syncfree) to avoid the additional sync between device and host.

## Develop PyTorch/XLA on a GPU instance (build PyTorch/XLA from source with GPU support)

1. Inside a GPU VM, create a docker container from a development docker image. For example:

```
sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
sudo docker run --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8
sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash
```

2. Build PyTorch and PyTorch/XLA from source.

```
git clone https://github.com/pytorch/pytorch.git
cd pytorch
USE_CUDA=0 python setup.py install

git clone https://github.com/pytorch/xla.git
cd xla
XLA_CUDA=1 python setup.py install
Copy link

@carmocca carmocca Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this env variable also relevant when installing a nightly wheel? This might be worth adding too

e.g. pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/11.7/torch-nightly-cp38-cp38-linux_x86_64.whl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XLA_CUDA=1 is a build flag, shouldn't matter when you already have the whl

```

3. Verify if PyTorch and PyTorch/XLA have been installed successfully.

If you can run the test in the section
[Run a simple model](#run-a-simple-model) successfully, then PyTorch and
PyTorch/XLA should have been installed successfully.