Investigate need for `JULIA_CUDA_MEMORY_POOL=none` #843

vchuravy · 2024-06-19T23:57:51Z

One datapoint is that locally on OpenMPI 5, the test ran fine on one GPU.

There was a discussion elsewhere (maybe @maleadt remembers) if that flag is still needed or what MPI versions can now handle the new memory interface.

luraess · 2024-06-20T06:54:07Z

From Slack HPC channel (21 days ago)

Tim Besard:
Could somebody who understands CUDA + OpenMPI re-evaluate #537? IIUC, the fact that UCX now supports the CUDA stream-ordered allocator (https://github.com/openucx/ucx/blob/04897a079ac88713842f7209c5e82430d095444e/NEWS#L63) means that this workaround shouldn't be suggested anymore.

The reason being that it is pretty costly, performance wise, and I see it set all the time in HPC user's environments (presumably provided by the system config)

One could (and should) test but isn't UCX one amongst other PML and thus there may be no guarantee that it will just work on clusters not relying on UCX but e.g. libfabric?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate need for `JULIA_CUDA_MEMORY_POOL=none` #843

Investigate need for `JULIA_CUDA_MEMORY_POOL=none` #843

vchuravy commented Jun 19, 2024

luraess commented Jun 20, 2024

Investigate need for JULIA_CUDA_MEMORY_POOL=none #843

Investigate need for JULIA_CUDA_MEMORY_POOL=none #843

Comments

vchuravy commented Jun 19, 2024

luraess commented Jun 20, 2024

Investigate need for `JULIA_CUDA_MEMORY_POOL=none` #843

Investigate need for `JULIA_CUDA_MEMORY_POOL=none` #843