Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use hipMemcpyDefault in 2D memcpys due to bugs in HIP #1106

Merged
merged 6 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion include/dlaf/factorization/qr/t_factor_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ struct Helpers<Backend::GPU, Device::GPU, T> {

// This assumes that elements in taus are contiguous, i.e. that it is a column vector of size k
whip::memcpy_2d_async(tile_t.ptr(), to_sizet(tile_t.ld() + 1) * sizeof(T), taus.ptr(), sizeof(T),
sizeof(T), to_sizet(k), whip::memcpy_default, stream);
sizeof(T), to_sizet(k), whip::memcpy_host_to_device, stream);
}

for (SizeType j = 0; j < k; ++j) {
Expand Down
76 changes: 75 additions & 1 deletion src/lapack/gpu/lacpy.cu
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,80 @@ __global__ void lacpy(cublasFillMode_t uplo, const unsigned m, const unsigned n,
}
}

static whip::memcpy_kind get_lacpy_memcpy_kind(const void* src, const void* dst) {
// If HIP is version 5.6 or newer, avoid the use of hipMemcpyDefault as it is
// buggy with 2D memcpy. Instead try to infer the memory type using
// hipPointerGetAttributes. See:
// - https://github.com/ROCm/clr/commit/56daa6c4891b43ec233e9c63f755e3f7b45842b4
// - https://github.com/ROCm/clr/commit/d3bfb55d7a934355257a72fab538a0a634b43cad
//
// Note that hipMemoryTypeManaged is not available in older versions, but it is
// already available in 5.3 so we don't do a separate check for availability.
#if defined(DLAF_WITH_HIP) && HIP_VERSION >= 50600000
constexpr auto get_memory_type = [](const void* p) {
hipPointerAttribute_t attributes{};
hipError_t status = hipPointerGetAttributes(&attributes, p);
if (status == hipErrorInvalidValue) {
// If HIP returns hipErrorInvalidValue we assume it's due
// to HIP not recognizing a non-HIP allocated pointer as
// host memory, and we assume the type is host.
return hipMemoryTypeHost;
}
else if (status != hipSuccess) {
throw whip::exception(status);
}

switch (attributes.type) {
#if HIP_VERSION >= 60000000
case hipMemoryTypeUnregistered:
[[fallthrough]];
#endif
case hipMemoryTypeHost:
return hipMemoryTypeHost;
case hipMemoryTypeArray:
[[fallthrough]];
case hipMemoryTypeDevice:
return hipMemoryTypeDevice;
case hipMemoryTypeUnified:
return hipMemoryTypeUnified;
case hipMemoryTypeManaged:
return hipMemoryTypeManaged;
default:
DLAF_UNREACHABLE_PLAIN;
}
};

hipMemoryType src_type = get_memory_type(src);
hipMemoryType dst_type = get_memory_type(dst);

if (src_type == hipMemoryTypeDevice && dst_type == hipMemoryTypeHost) {
return whip::memcpy_device_to_host;
}
else if (src_type == hipMemoryTypeHost && dst_type == hipMemoryTypeDevice) {
return whip::memcpy_host_to_device;
}
else if (src_type == hipMemoryTypeDevice && dst_type == hipMemoryTypeDevice) {
return whip::memcpy_device_to_device;
}
else if (src_type == hipMemoryTypeManaged || src_type == hipMemoryTypeUnified ||
dst_type == hipMemoryTypeManaged || dst_type == hipMemoryTypeUnified) {
return whip::memcpy_default;
}
else if (src_type == hipMemoryTypeHost && dst_type == hipMemoryTypeHost) {
DLAF_ASSERT(
false,
"Attempting to do a HIP lacpy with host source and destination, use the CPU lacpy instead");
}
Comment on lines +168 to +172
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this assertion is only triggered with HIP 5.6 and newer, but theoretically it's useful also with other versions, including with CUDA. I don't know if it's worth it to always do this assertion though. I don't think it's a big deal to not check it (as we do right now on master).

else {
DLAF_ASSERT(false,
"Attempting to do a HIP lacpy with unsupported source and destination memory type",
src_type, dst_type);
}
Comment on lines +122 to +177
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've slightly refactored this. It's still a bit of a mess but perhaps passable. I now:

  • Map a hipErrorInvalidValue from hipPointerGetAttributes to hipMemoryTypeHost. From a small test it seems like a malloc'd pointer does not go through the same path as a hipHostMalloc'd pointer, but we still need to cover the case of hipMemoryTypeHost, so we can treat both the same.
  • Map hipMemoryTypeUnregistered (if available) to hipMemoryTypeHost.
  • Map hipMemoryTypeArray to hipMemoryTypeDevice because the HIP documentation states that it's always on a device.
  • If any pointer is hipMemoryTypeManaged or hipMemoryTypeUnified the memcpy kind is hipMemcpyDefault because I don't know what else to do with them. This may be too conservative, but at least I'm not making any assumptions that I'm not sure about. This case was previously also using hipMemcpyDefault so it should not be worse than before.

All of the above is only done if HIP is 5.6 or newer. I'm hoping that we can put an upper bound on the workaround maybe with 6.1 or whichever version properly fixes the behaviour.

#endif

return whip::memcpy_default;
}

template <class T>
void lacpy(const blas::Uplo uplo, const SizeType m, const SizeType n, const T* a, const SizeType lda,
T* b, const SizeType ldb, const whip::stream_t stream) {
Expand All @@ -118,7 +192,7 @@ void lacpy(const blas::Uplo uplo, const SizeType m, const SizeType n, const T* a
constexpr unsigned kernel_tile_size_cols = kernels::LacpyParams::kernel_tile_size_cols;

if (uplo == blas::Uplo::General) {
const whip::memcpy_kind kind = whip::memcpy_default;
whip::memcpy_kind kind = get_lacpy_memcpy_kind(a, b);
whip::memcpy_2d_async(b, to_sizet(ldb) * sizeof(T), a, to_sizet(lda) * sizeof(T),
to_sizet(m) * sizeof(T), to_sizet(n), kind, stream);
}
Expand Down
Loading