Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specification for GPU device index #96

Merged
merged 43 commits into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
83ff00a
Have get_device use torch::Device
jwallwork23 Mar 19, 2024
a392900
Add device_number arg for get_device
jwallwork23 Mar 19, 2024
2552c91
Throw error if device_number used in CPU-only case
jwallwork23 Mar 19, 2024
9b0b7dd
Disallow negative device number
jwallwork23 Mar 19, 2024
e44e3e6
Actually use the device number
jwallwork23 Mar 19, 2024
cf39472
Use device number for torch_zeros
jwallwork23 Mar 19, 2024
01b8063
Use device number for torch_ones
jwallwork23 Mar 19, 2024
530fa19
Use device number for torch_empty
jwallwork23 Mar 19, 2024
af7a8af
Use device number for torch_from_blob
jwallwork23 Mar 19, 2024
e2fe070
Device and device number args for torch_module_load
jwallwork23 Mar 19, 2024
fd729a3
Pass device and device number to torch_jit_load by value
jwallwork23 Mar 19, 2024
3b3e62c
Make device number argument to torch_module_load optional
jwallwork23 Mar 19, 2024
5fe34b0
Make device number argument to torch_tensor_from_array optional
jwallwork23 Mar 19, 2024
3fe5258
Make device number argument to other subroutines optional
jwallwork23 Mar 19, 2024
9ed2452
Make device argument to torch_module_load optional
jwallwork23 Mar 19, 2024
fbc6a12
Add function for determining device_index
jwallwork23 Mar 20, 2024
58d28ed
Rename device number as index
jwallwork23 Mar 20, 2024
682d887
Rename device as device type
jwallwork23 Mar 20, 2024
2d9698c
Device index defaults to -1 on CPU and 0 on GPU
jwallwork23 Mar 20, 2024
ca40777
Make device type and index optional on C++ side
jwallwork23 Mar 20, 2024
e37f743
Fix typo in torch_model_load
jwallwork23 Mar 20, 2024
8b63dfe
Fix typos in example 1
jwallwork23 Mar 22, 2024
8982129
Initial draft of example 3_MultiGPU
jwallwork23 Mar 22, 2024
1eec646
Differentiate between errors and warnings in C++ code
jwallwork23 Mar 25, 2024
2739c16
Formatting
jwallwork23 Mar 25, 2024
fc18b52
Add mpi4py to requirements for example 3
jwallwork23 Mar 25, 2024
2b0086a
Use mpi4py to differ inputs in simplenet_infer_python
jwallwork23 Mar 25, 2024
fced4c1
Raise ValueError for Python inference with invalid device
jwallwork23 Mar 25, 2024
188b305
Print rank in Python case; updates to README
jwallwork23 Mar 25, 2024
dcfb153
Setup MPI for simplenet_infer_fortran, too
jwallwork23 Mar 25, 2024
392afb9
Write formatting for example 3
jwallwork23 Mar 25, 2024
9fd3040
Add note on building with Make
jwallwork23 Mar 25, 2024
24d5b6a
Print before and after; mpi_finalise; output on CPU; comments
jwallwork23 Mar 27, 2024
a44e262
Merge branch 'main' into 85_gpu_device_number
jwallwork23 Mar 27, 2024
5ebe845
Docs: device->device_type for consistency
jwallwork23 Mar 27, 2024
18fca7b
Add docs on MultiGPU
jwallwork23 Mar 27, 2024
475a859
Update warning text for defaulting to 0
jwallwork23 Mar 28, 2024
3f26457
Mention MPI in requirements
jwallwork23 Mar 28, 2024
3dba29a
Update outputs for example 3
jwallwork23 Mar 28, 2024
0e3272e
Use NP rather than 4 GPUs
jwallwork23 Mar 28, 2024
99d3b5b
Implement SimpleNet in example 3 but with a twist
jwallwork23 Mar 28, 2024
99002d5
Add code snippets for multi-GPU doc section
jwallwork23 Mar 28, 2024
e2b68bd
Add note about multiple GPU support to README.md.
jatkinson1000 Mar 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,8 @@ adaptations to the code:
2. When using FTorch in Fortran, set the device for the input
tensor(s) to `torch_kCUDA`, rather than `torch_kCPU`.

For detailed guidance about running on GPU please see the
For detailed guidance about running on GPU, including instructions for using multiple
devices, please see the
[online GPU documentation](https://cambridge-iccs.github.io/FTorch/page/gpu.html).

## Examples
Expand Down
6 changes: 3 additions & 3 deletions examples/1_SimpleNet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ covered in later examples.

## Description

A python file `simplenet.py` is provided that defines a very simple pytorch 'net' that takes an input
A python file `simplenet.py` is provided that defines a very simple PyTorch 'net' that takes an input
vector of length 5 and applies a single `Linear` layer to multiply it by 2.

A modified version of the `pt2ts.py` tool saves this simple net to TorchScript.
Expand All @@ -29,7 +29,7 @@ To run this example requires:
## Running

To run this example install FTorch as described in the main documentation.
Then from this directory create a virtual environment an install the necessary python
Then from this directory create a virtual environment and install the necessary python
modules:
```
python3 -m venv venv
Expand All @@ -47,7 +47,7 @@ tensor([[0, 2, 4, 6, 8]])
```

To save the SimpleNet model to TorchScript run the modified version of the
`pt2ts.py` tool :
`pt2ts.py` tool:
```
python3 pt2ts.py
```
Expand Down
4 changes: 3 additions & 1 deletion examples/1_SimpleNet/simplenet_infer_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,16 @@ def deploy(saved_model: str, device: str, batch_size: int = 1) -> torch.Tensor:
output_gpu = model.forward(input_tensor_gpu)
output = output_gpu.to(torch.device("cpu"))

else:
raise ValueError(f"Device '{device}' not recognised.")

return output


if __name__ == "__main__":
saved_model_file = "saved_simplenet_model_cpu.pt"

device_to_run = "cpu"
# device = "cuda"

batch_size_to_run = 1

Expand Down
3 changes: 3 additions & 0 deletions examples/2_ResNet18/resnet_infer_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ def deploy(saved_model: str, device: str, batch_size: int = 1) -> torch.Tensor:
output_gpu = model.forward(input_tensor_gpu)
output = output_gpu.to(torch.device("cpu"))

else:
raise ValueError(f"Device '{device}' not recognised.")

return output


Expand Down
21 changes: 21 additions & 0 deletions examples/3_MultiGPU/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
#policy CMP0076 - target_sources source files are relative to file where target_sources is run
cmake_policy (SET CMP0076 NEW)

set(PROJECT_NAME MultiGPUExample)

project(${PROJECT_NAME} LANGUAGES Fortran)

# Build in Debug mode if not specified
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Debug CACHE STRING "" FORCE)
endif()

find_package(FTorch)
find_package(MPI REQUIRED)
message(STATUS "Building with Fortran PyTorch coupling")

# Fortran example
add_executable(simplenet_infer_fortran simplenet_infer_fortran.f90)
target_link_libraries(simplenet_infer_fortran PRIVATE FTorch::ftorch)
target_link_libraries(simplenet_infer_fortran PRIVATE MPI::MPI_Fortran)
113 changes: 113 additions & 0 deletions examples/3_MultiGPU/README.md
jatkinson1000 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Example 3 - MultiGPU

This example revisits the SimpleNet example and demonstrates how to run it using
multiple GPU devices.


## Description

The same python file `simplenet.py` is used from the earlier example. Recall that it
defines a very simple PyTorch network that takes an input of length 5 and applies a
single `Linear` layer to multiply it by 2.

The same `pt2ts.py` tool is used to save the simple network to TorchScript.

A series of files `simplenet_infer_<LANG>` then bind from other languages to run the
TorchScript model in inference mode.

## Dependencies

To run this example requires:

- cmake
- An MPI installation.
- mpif90
- FTorch (installed as described in main package)
- python3

## Running

To run this example install FTorch as described in the main documentation. Then from
this directory create a virtual environment and install the necessary python modules:
```
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

You can check that everything is working by running `simplenet.py`:
```
python3 simplenet.py
```
As before, this defines the network and runs it with an input tensor
[0.0, 1.0, 2.0, 3.0, 4.0] to produce the result:
```
tensor([[0, 2, 4, 6, 8]])
```

To save the SimpleNet model to TorchScript run the modified version of the `pt2ts.py`
tool:
```
python3 pt2ts.py
```
which will generate `saved_simplenet_model_cuda.pt` - the TorchScript instance of the
network. The only difference with the earlier example is that the model is built to
be run using CUDA rather than on CPU.

You can check that everything is working by running the `simplenet_infer_python.py`
script. It's set up with MPI such that a different GPU device is associated with each
MPI rank. You should substitute `<NP>` with the number of GPUs you wish to run with:
```
mpiexec -np <NP> python3 simplenet_infer_python.py
```
This reads the model in from the TorchScript file and runs it with an different input
tensor on each GPU device: [0.0, 1.0, 2.0, 3.0, 4.0], plus the device index in each
entry. The result should be (some permutation of):
```
0: tensor([[0., 2., 4., 6., 8.]])
1: tensor([[ 2., 4., 6., 8., 10.]])
2: tensor([[ 4., 6., 8., 10., 12.]])
3: tensor([[ 6., 8., 10., 12., 14.]])
```

At this point we no longer require python, so can deactivate the virtual environment:
```
deactivate
```
jatkinson1000 marked this conversation as resolved.
Show resolved Hide resolved

To call the saved SimpleNet model from Fortran we need to compiler the `simplnet_infer`
files. This can be done using the included `CMakeLists.txt` as follows, noting that we
need to use an MPI-enabled Fortran compiler:
```
mkdir build
cd build
cmake .. -DCMAKE_PREFIX_PATH=<path/to/your/installation/of/library/> -DCMAKE_BUILD_TYPE=Release
cmake --build .
```

To run the compiled code calling the saved SimpleNet TorchScript from Fortran, run the
executable with an argument of the saved model file. Again, specify the number of MPI
processes according to the desired number of GPUs:
```
mpiexec -np <NP> ./simplenet_infer_fortran ../saved_simplenet_model_cuda.pt
```

This runs the model with the same inputs as described above and should produce (some
permutation of) the output:
```
input on rank0: [ 0.0, 1.0, 2.0, 3.0, 4.0]
input on rank1: [ 1.0, 2.0, 3.0, 4.0, 5.0]
input on rank2: [ 2.0, 3.0, 4.0, 5.0, 6.0]
input on rank3: [ 3.0, 4.0, 5.0, 6.0, 7.0]
output on rank0: [ 0.0, 2.0, 4.0, 6.0, 8.0]
output on rank1: [ 2.0, 4.0, 6.0, 8.0, 10.0]
output on rank2: [ 4.0, 6.0, 8.0, 10.0, 12.0]
output on rank3: [ 6.0, 8.0, 10.0, 12.0, 14.0]
```

Alternatively, we can use `make`, instead of cmake, copying the Makefile over from the
first example:
```
cp ../1_SimpleNet/Makefile .
```
See the instructions in that example directory for further details.
158 changes: 158 additions & 0 deletions examples/3_MultiGPU/pt2ts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
"""Load a PyTorch model and convert it to TorchScript."""

from typing import Optional
import torch

# FPTLIB-TODO
# Add a module import with your model here:
# This example assumes the model architecture is in an adjacent module `my_ml_model.py`
import simplenet


def script_to_torchscript(
model: torch.nn.Module, filename: Optional[str] = "scripted_model.pt"
) -> None:
"""
Save PyTorch model to TorchScript using scripting.

Parameters
----------
model : torch.NN.Module
a PyTorch model
filename : str
name of file to save to
"""
print("Saving model using scripting...", end="")
scripted_model = torch.jit.script(model)
# print(scripted_model.code)
scripted_model.save(filename)
print("done.")


def trace_to_torchscript(
model: torch.nn.Module,
dummy_input: torch.Tensor,
filename: Optional[str] = "traced_model.pt",
) -> None:
"""
Save PyTorch model to TorchScript using tracing.

Parameters
----------
model : torch.NN.Module
a PyTorch model
dummy_input : torch.Tensor
appropriate size Tensor to act as input to model
filename : str
name of file to save to
"""
print("Saving model using tracing...", end="")
traced_model = torch.jit.trace(model, dummy_input)
frozen_model = torch.jit.freeze(traced_model)
## print(frozen_model.graph)
## print(frozen_model.code)
frozen_model.save(filename)
print("done.")


def load_torchscript(filename: Optional[str] = "saved_model.pt") -> torch.nn.Module:
"""
Load a TorchScript from file.

Parameters
----------
filename : str
name of file containing TorchScript model
"""
model = torch.jit.load(filename)

return model


if __name__ == "__main__":
# =====================================================
# Load model and prepare for saving
# =====================================================

# FPTLIB-TODO
# Load a pre-trained PyTorch model
# Insert code here to load your model as `trained_model`.
# This example assumes my_ml_model has a method `initialize` to load
# architecture, weights, and place in inference mode
trained_model = simplenet.SimpleNet()

# Switch off specific layers/parts of the model that behave
# differently during training and inference.
# This may have been done by the user already, so just make sure here.
trained_model.eval()

# =====================================================
# Prepare dummy input and check model runs
# =====================================================

# FPTLIB-TODO
# Generate a dummy input Tensor `dummy_input` to the model of appropriate size.
# This example assumes one input of size (5)
trained_model_dummy_input = torch.ones(5)

# FPTLIB-TODO
# Uncomment the following lines to save for inference on GPU (rather than CPU):
device = torch.device("cuda")
trained_model = trained_model.to(device)
trained_model.eval()
trained_model_dummy_input = trained_model_dummy_input.to(device)

# FPTLIB-TODO
# Run model for dummy inputs
# If something isn't working This will generate an error
trained_model_dummy_output = trained_model(
trained_model_dummy_input,
)

# =====================================================
# Save model
# =====================================================

# FPTLIB-TODO
# Set the name of the file you want to save the torchscript model to:
saved_ts_filename = "saved_simplenet_model_cuda.pt"

# FPTLIB-TODO
# Save the PyTorch model using either scripting (recommended where possible) or tracing
# -----------
# Scripting
# -----------
script_to_torchscript(trained_model, filename=saved_ts_filename)

# -----------
# Tracing
# -----------
# trace_to_torchscript(trained_model, trained_model_dummy_input, filename=saved_ts_filename)

print(f"Saved model to TorchScript in '{saved_ts_filename}'.")

# =====================================================
# Check model saved OK
# =====================================================

# Load torchscript and run model as a test
# FPTLIB-TODO
# Scale inputs as above and, if required, move inputs and mode to GPU
trained_model_dummy_input = 2.0 * trained_model_dummy_input
trained_model_dummy_input = trained_model_dummy_input.to("cuda")
trained_model_testing_output = trained_model(
trained_model_dummy_input,
)
ts_model = load_torchscript(filename=saved_ts_filename)
ts_model_output = ts_model(
trained_model_dummy_input,
)

if torch.all(ts_model_output.eq(trained_model_testing_output)):
print("Saved TorchScript model working as expected in a basic test.")
print("Users should perform further validation as appropriate.")
else:
raise RuntimeError(
"Saved Torchscript model is not performing as expected.\n"
"Consider using scripting if you used tracing, or investigate further."
)
2 changes: 2 additions & 0 deletions examples/3_MultiGPU/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
mpi4py
torch
Loading
Loading