-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use hipMemcpyDefault
in 2D memcpys due to bugs in HIP
#1106
Conversation
cscs-ci run |
2D memcpys involving host buffers with non-zero offsets are broken on HIP 5.6 onwards. Some cases have been fixed in ROCm/clr@56daa6c (included in 5.7.0) and further fixes in are in ROCm/clr@d3bfb55 (not released as of this commit).
ccf2b9a
to
5b8f856
Compare
cscs-ci run |
I would prefer not to add the extra parameter to lacpy. Did you considered to use |
I did not, but that sounds like a good idea! I'll check if we can use that and update to use that instead if that's the case. |
cscs-ci run |
c27f1aa
to
6a908e1
Compare
cscs-ci run |
@rasolca I think using
|
The enum contains different values depending on the rocm version. For sure hipMemoryTypeUnregistered has to be considered as it represent system allocated memory. (For 5.6 I suppose that it works as for CUDA which returns an error). |
https://docs.amd.com/projects/HIP/en/latest/doxygen/html/group___memory.html#ga7c3e8663feebb7be9fd3a1e5139bcefc seems to indicate that maybe even the newer versions return an error for unregistered memory, but it's not 100% clear from the description. I'll try it out and see what it returns. |
@rasolca indeed 5.6.1 returns We could:
Perhaps worth discussing in today's meeting. |
cscs-ci run |
if (status == hipErrorInvalidValue) { | ||
// If HIP returns hipErrorInvalidValue we assume it's due | ||
// to HIP not recognizing a non-HIP allocated pointer as | ||
// host memory, and we assume the type is host. | ||
return hipMemoryTypeHost; | ||
} | ||
else if (status != hipSuccess) { | ||
throw whip::exception(status); | ||
} | ||
|
||
switch (attributes.type) { | ||
#if HIP_VERSION >= 60000000 | ||
case hipMemoryTypeUnregistered: | ||
[[fallthrough]]; | ||
#endif | ||
case hipMemoryTypeHost: | ||
return hipMemoryTypeHost; | ||
case hipMemoryTypeArray: | ||
[[fallthrough]]; | ||
case hipMemoryTypeDevice: | ||
return hipMemoryTypeDevice; | ||
case hipMemoryTypeUnified: | ||
return hipMemoryTypeUnified; | ||
case hipMemoryTypeManaged: | ||
return hipMemoryTypeManaged; | ||
default: | ||
DLAF_UNREACHABLE_PLAIN; | ||
} | ||
}; | ||
|
||
hipMemoryType src_type = get_memory_type(src); | ||
hipMemoryType dst_type = get_memory_type(dst); | ||
|
||
if (src_type == hipMemoryTypeDevice && dst_type == hipMemoryTypeHost) { | ||
return whip::memcpy_device_to_host; | ||
} | ||
else if (src_type == hipMemoryTypeHost && dst_type == hipMemoryTypeDevice) { | ||
return whip::memcpy_host_to_device; | ||
} | ||
else if (src_type == hipMemoryTypeDevice && dst_type == hipMemoryTypeDevice) { | ||
return whip::memcpy_device_to_device; | ||
} | ||
else if (src_type == hipMemoryTypeManaged || src_type == hipMemoryTypeUnified || | ||
dst_type == hipMemoryTypeManaged || dst_type == hipMemoryTypeUnified) { | ||
return whip::memcpy_default; | ||
} | ||
else if (src_type == hipMemoryTypeHost && dst_type == hipMemoryTypeHost) { | ||
DLAF_ASSERT( | ||
false, | ||
"Attempting to do a HIP lacpy with host source and destination, use the CPU lacpy instead"); | ||
} | ||
else { | ||
DLAF_ASSERT(false, | ||
"Attempting to do a HIP lacpy with unsupported source and destination memory type", | ||
src_type, dst_type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've slightly refactored this. It's still a bit of a mess but perhaps passable. I now:
- Map a
hipErrorInvalidValue
fromhipPointerGetAttributes
tohipMemoryTypeHost
. From a small test it seems like amalloc
'd pointer does not go through the same path as ahipHostMalloc
'd pointer, but we still need to cover the case ofhipMemoryTypeHost
, so we can treat both the same. - Map
hipMemoryTypeUnregistered
(if available) tohipMemoryTypeHost
. - Map
hipMemoryTypeArray
tohipMemoryTypeDevice
because the HIP documentation states that it's always on a device. - If any pointer is
hipMemoryTypeManaged
orhipMemoryTypeUnified
the memcpy kind ishipMemcpyDefault
because I don't know what else to do with them. This may be too conservative, but at least I'm not making any assumptions that I'm not sure about. This case was previously also usinghipMemcpyDefault
so it should not be worse than before.
All of the above is only done if HIP is 5.6 or newer. I'm hoping that we can put an upper bound on the workaround maybe with 6.1 or whichever version properly fixes the behaviour.
else if (src_type == hipMemoryTypeHost && dst_type == hipMemoryTypeHost) { | ||
DLAF_ASSERT( | ||
false, | ||
"Attempting to do a HIP lacpy with host source and destination, use the CPU lacpy instead"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this assertion is only triggered with HIP 5.6 and newer, but theoretically it's useful also with other versions, including with CUDA. I don't know if it's worth it to always do this assertion though. I don't think it's a big deal to not check it (as we do right now on master
).
cscs-ci run |
ba8bff3
to
47d6f36
Compare
cscs-ci run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
Just a minor comment: wondering if this workaround shouldn't be moved to whip
. Feel free to ignore this, we can always move it if we decide it's a good idea.
It's a very good point, but at least at this point I'm not confident enough in the workaround to put it in whip. I know it works for our use cases and I think it works for others, but I really haven't tested it extensively enough to put it into |
2D memcpys involving host buffers with non-zero offsets are broken on HIP 5.6 onwards. Some cases have been fixed in ROCm/clr@56daa6c (included in 5.7.0) and further fixes in are in ROCm/clr@d3bfb55 (not released as of this PR).
Where it's straightforward to do so I've simply replaced
whip::memcpy_default
with the explicit kind. Inlacpy
we don't know the direction anymore so I've added an additional parameter for the kind that has to be passed by the caller. Finally, I have not changed all instances ofwhip::memcpy_default
(e.g. in 1D memcpys) though I wouldn't oppose that at all as I find being explicit is rarely a bad thing.