-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature[cartesian]: Add support for HIP/ROCm #1278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a review, just a couple of suggestions
@@ -243,6 +245,18 @@ def allocate_gpu( | |||
if device_field.ndim > 0: | |||
device_field = device_field[tuple(slice(0, s, None) for s in shape)] | |||
|
|||
if gt_config.GT4PY_USE_HIP: | |||
# HIP/ROCm lack support for __cuda_array_interface__ | |||
device_field.__hip_array_interface__ = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be called __cuda_array_interface__
to avoid any changes in the gridtools bindings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, unless things have changed, it seems that the community just always uses the cuda_array_interface name even for AMD device buffers. (e.g. most importantly cupy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree that __cuda_array_interface__
should be the way to go, but that's not feasible at the moment. CuPy does not allow to either read or overload __cuda_array_interface__
. See this issue and links therein: cupy/cupy#7431. This said, my solution is clearly a hack. DLPack seems the solution the community is converging to, but I must said I'm not very familiar with it, so I need to dig into it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of cupy/cupy#7431 is that this is some particular case where you would want to wrap cpp side allocated data in a cupy buffer. Have you tried and ran into the error? Last time we ran AMD a while back we used __cuda_array_interface__
happily to pass cupy arrays to stencils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tried and the GridTools bindings threw the exception AttributeError: HIP/ROCm does not support cuda array interface
. Same happens if you try to read __cuda_array_interface__
from within Python. But I cannot exclude I'm missing something important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for DLPack.
I've seen accelerated adoption lately. Is it on the radar @egparedes to add it ?
Feels not urgent, but it's something we could contribute later down the road
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this error is there, is it safe to just work around it in this way? There must be a reason why it is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm not an expert here, but it's very likely that this workaround only works in simple scenarios, as ours (e.g. using a single stream).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for DLPack. I've seen accelerated adoption lately. Is it on the radar @egparedes to add it ? Feels not urgent, but it's something we could contribute later down the road
@FlorianDeconinck Yes, we're looking into supporting DLPack. There is no timeline yet, but hopefully not far in the future.
src/gt4py/cartesian/config.py
Outdated
@@ -32,6 +32,8 @@ | |||
|
|||
CUDA_HOST_CXX: Optional[str] = os.environ.get("CUDA_HOST_CXX", None) | |||
|
|||
GT4PY_USE_HIP: int = int(os.environ.get("GT4PY_USE_HIP", 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a way to autodetect if cupy has been compiled with HIP. This is a naive suggestion to do it using dlpack, but probably there is a simpler way using some esoteric cupy API endpoint (maybe cp.show_config()
?).
GT4PY_USE_HIP: int = int(os.environ.get("GT4PY_USE_HIP", 0)) | |
if "GT4PY_USE_HIP" in os.environ: | |
GT4PY_USE_HIP: bool = bool(int(os.environ["GT4PY_USE_HIP"])) | |
else: | |
# Autodetect cupy with ROCm/HIP using DLPack device type | |
# Reference: https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack_device__.html#array_api.array.__dlpack_device__ | |
try: | |
import cupy as _cp | |
GT4PY_USE_HIP = _cp.empty((1,)).__dlpack_device__()[0] == 10 | |
del _cp | |
except Exception: | |
GT4PY_USE_HIP = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I partially followed your suggestion using cupy.cuda.get_hipcc_path()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a small comment.
I wonder if there is a way at CSCS to run something on AMD as part of CI.
If this needs an extra set of tests NASA/NOAA can wiggle some AMD systems, hit me up. (Great to see this!) |
Light-weight PR to make the Python bindings work with AMD GPU buffers. Complementary to GridTools/gt4py#1278. --------- Co-authored-by: Hannes Vogt <hannes@havogt.de>
cscs-ci run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvcc needs -isystem=
cscs-ci run |
"-isystem{}".format(gt_include_path), | ||
"-isystem{}".format(gt_config.build_settings["boost_include_path"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hipcc
(clang) requires -isystem [dir]
.
cscs-ci run |
Light-weight PR to make the Python bindings work with AMD GPU buffers. Complementary to GridTools/gt4py#1278. --------- Co-authored-by: Hannes Vogt <hannes@havogt.de>
Adaptations to generate GridTools/DaCe code targeting AMD GPUs. Complemented by GridTools/gridtools#1759.