From 75a6a9d5e480f0b7790e2bcee9614361f9fa55ec Mon Sep 17 00:00:00 2001 From: Xiongfei Wei Date: Mon, 31 Jul 2023 22:25:59 +0000 Subject: [PATCH 1/4] Add gpu doc for how to build PyTorch/XLA from source with GPU support. --- docs/gpu.md | 39 +++++++++++++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/docs/gpu.md b/docs/gpu.md index 7bf8f665dc28..51203d9e3208 100644 --- a/docs/gpu.md +++ b/docs/gpu.md @@ -1,15 +1,15 @@ # How to run with PyTorch/XLA:GPU -PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU This doc will go over the basic steps to run PyTorch/XLA on a nvidia gpu instance +PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU. This doc will go over the basic steps to run PyTorch/XLA on a nvidia GPU instances. ## Create a GPU instance -Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a GPU instance with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla/tree/jackcao/gpu_doc#-available-images-and-wheels). +To create a GPU VM in Google Compute Engine, follow the [Google Cloud documentation](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus). -## Environment Setup -To create a GPU VM in Google Compute Engine, follow the [Google Cloud documentation](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus). +## Environment Setup ### Docker +Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a docker container with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla#available-docker-images-and-wheels). ``` sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7 sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common @@ -74,3 +74,34 @@ Epoch 1 train begin 06:12:38 ``` ## AMP (AUTOMATIC MIXED PRECISION) AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You can checkout our [mnist example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist_amp.py) and [imagenet example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_imagenet_amp.py). Note that we also used a modified version of [optimizers](https://github.com/pytorch/xla/tree/master/torch_xla/amp/syncfree) to avoid the additional sync between device and host. + +## Develop PyTorch/XLA on a GPU instance (build PyTorch/XLA from source with GPU support) + +1. Inside a GPU VM, create a docker container from the development docker image. For example: +``` +sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 +sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common +distribution=$(. /etc/os-release;echo $ID$VERSION_ID) +curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - +curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list +sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit +sudo systemctl restart docker +sudo docker run --gpus all -it -d us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 +sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash +``` + +2. Build PyTorch and PyTorch/XLA from source. +``` +git clone https://github.com/pytorch/pytorch.git +cd pytorch +USE_CUDA=0 python setup.py install + +git clone https://github.com/pytorch/xla.git +cd xla +BAZEL_REMOTE_CACHE=0 XLA_CUDA=1 python setup.py install +``` + +3. Verify if PyTorch and PyTorch/XLA have been installed successfully. +If you can run the test in the section +[Run a simple model](#run-a-simple-model) successfully, then PyTorch and +PyTorch/XLA should have been installed successfully. From 7f2a9769d6eb20b65fc5a68a916cdfabb89bba19 Mon Sep 17 00:00:00 2001 From: Xiongfei Wei Date: Mon, 31 Jul 2023 22:28:06 +0000 Subject: [PATCH 2/4] fix typo --- docs/gpu.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/gpu.md b/docs/gpu.md index 51203d9e3208..7e6a4431e947 100644 --- a/docs/gpu.md +++ b/docs/gpu.md @@ -77,7 +77,7 @@ AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You ca ## Develop PyTorch/XLA on a GPU instance (build PyTorch/XLA from source with GPU support) -1. Inside a GPU VM, create a docker container from the development docker image. For example: +1. Inside a GPU VM, create a docker container from a development docker image. For example: ``` sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common From 05d5aff8e58a6068a2e8e0549ccd5493fe68f74b Mon Sep 17 00:00:00 2001 From: Xiongfei Wei Date: Mon, 31 Jul 2023 22:39:29 +0000 Subject: [PATCH 3/4] fix comments --- CONTRIBUTING.md | 4 ++++ docs/gpu.md | 5 ++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9220a2bfde48..1f1659ae4150 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -43,6 +43,10 @@ We recommend you to use our prebuilt Docker image to start your development work python setup.py install ``` +### Build PyTorch/XLA from source with GPU support + +Please refer to this [guide](https://github.com/pytorch/xla/blob/master/docs/gpu.md#develop-pytorchxla-on-a-gpu-instance-build-pytorchxla-from-source-with-gpu-support). + ## Before Submitting A Pull Request: In `pytorch/xla` repo we enforce coding style for both C++ and Python files. Please try to format diff --git a/docs/gpu.md b/docs/gpu.md index 7e6a4431e947..af9a96b1b730 100644 --- a/docs/gpu.md +++ b/docs/gpu.md @@ -78,6 +78,7 @@ AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You ca ## Develop PyTorch/XLA on a GPU instance (build PyTorch/XLA from source with GPU support) 1. Inside a GPU VM, create a docker container from a development docker image. For example: + ``` sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/development:3.8_cuda_11.8 sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common @@ -91,6 +92,7 @@ sudo docker exec -it $(sudo docker ps | awk 'NR==2 { print $1 }') /bin/bash ``` 2. Build PyTorch and PyTorch/XLA from source. + ``` git clone https://github.com/pytorch/pytorch.git cd pytorch @@ -98,10 +100,11 @@ USE_CUDA=0 python setup.py install git clone https://github.com/pytorch/xla.git cd xla -BAZEL_REMOTE_CACHE=0 XLA_CUDA=1 python setup.py install +XLA_CUDA=1 python setup.py install ``` 3. Verify if PyTorch and PyTorch/XLA have been installed successfully. + If you can run the test in the section [Run a simple model](#run-a-simple-model) successfully, then PyTorch and PyTorch/XLA should have been installed successfully. From cab50bb1d1ae0afdaca5316af07cb4efc53a805f Mon Sep 17 00:00:00 2001 From: Xiongfei Wei Date: Mon, 31 Jul 2023 23:20:00 +0000 Subject: [PATCH 4/4] fix comments --- docs/gpu.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/gpu.md b/docs/gpu.md index af9a96b1b730..856643596864 100644 --- a/docs/gpu.md +++ b/docs/gpu.md @@ -3,8 +3,8 @@ PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU. This doc will go over the basic steps to run PyTorch/XLA on a nvidia GPU instances. ## Create a GPU instance -To create a GPU VM in Google Compute Engine, follow the [Google Cloud documentation](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus). +You can either use a local machine with GPU attached or a GPU VM on the cloud. For example in Google Cloud you can follow this [doc](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus) to create the GPU VM. ## Environment Setup