Memory alloction/deallocation is not safe with MVAPICH #1703

ye-luo · 2019-07-05T17:43:07Z

After investigating 'segmentation fault ' on Cooley at ALCF, I found that the Mallocator was causing the problem but I think this is not our fault but MVAPICH.

Mallocator uses aligned_alloc whenever available. If aligned_alloc is not available, posix_memalign is used. On Cooley, aligned_alloc is available from the OS.
In the libmpi.so of MVAPICH, it provides definition of both posix_memalign and free but not aligned_alloc. At linking, the aligned_alloc is picked up from OS and free is picked up from libmpi.so. The free was not able to delete the memory allocated by aligned_alloc safely and caused 'segmentation fault'.

Solution 1.
add if defined(MVAPICH2_VERSION). This doesn't work when mpi.h is not included.

Solution 2.
Users add -D QMC_EXTRA_LIBS=-lc when configuring cmake, -libc is added explicitly before libraries supplied by the MPI compiler wrapper.

Solution 3.
Just don't use aligned_alloc and stick to posix_memalign. posix_memalign is linux only, aligned_alloc is a language standard.

I think it is the fault of MVAPICH overriding system(libc/glibc) malloc/posix_memalign/free but aligned_alloc was missed.
I verified that MPICH(Intel/Cray) and OpenMPI doesn't override these routines.

I tend to use solution 2 and push MVAPICH to fix the problem. Any thoughts?

The text was updated successfully, but these errors were encountered:

PDoakORNL · 2019-07-05T17:55:57Z

I think that is best, they should not break libc.

ye-luo · 2019-07-05T18:04:06Z

Solution 2. We don't add -D QMC_EXTRA_LIBS=-lc. Users using MVAPICH needs to apply this workaround when they type cmake.

prckent · 2019-07-07T17:06:56Z

I vote for none of the above, although we inform the mvapich2 project regardless.

Solution: Detect mvapchi2 during our cmake configuration and put in the link time workaround.

harisubramoni · 2019-07-09T02:18:51Z

Dear, Ye.

Thank you for bringing this to our attention. We appreciate it. So far, we have not received any issues about users wanting to use “aligned_alloc”. We will see how to handle it in our code and get back to you.

Some history about this feature is given below.

As you may know, (barring a few exceptions) any buffer that an InifniBand HCA can act upon must be registered with it ahead of time. Since registration for InfiniBand is very expensive we attempt to cache these registrations so if the same buffer is re-used again for communication it will already be registered (speeding up the application). The reason why MVAPICH2 (and several other MPI libraries like OpenMPI – please refer to https://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned; https://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols/euro-pvmmpi-2006-hpc-protocols.pdf) intercept malloc and free routines is to allow correctness while caching these InfiniBand memory registrations (since the MPI library needs to know if the memory is being freed etc).

Whether disabling registration cache will have a negative effect on application performance depends entirely on the communication pattern of the application. If the application uses mostly small to medium sized messages (approximately less than 16 KB), then disabling registration cache will mostly have no impact on the performance of the application.

The following section of the userguide has more information about the impact of disabling memory registration cache on application performance.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.1-userguide.html#x1-1340009.1.3

This feature can be disabled at runtime by setting “MV2_USE_LAZY_MEM_UNREGISTER=0”. The following section of the userguide has more information about this parameter.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.1-userguide.html#x1-26100011.81

This feature can be disabled at configuration time, the “--disable-registration-cache” parameter can be used. The following section of the userguide has more information about this parameter.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.1-userguide.html#x1-140004.4

Best,
Hari.

ye-luo · 2019-07-09T13:28:51Z

@harisubramoni I tried MV2_USE_LAZY_MEM_UNREGISTER=0 and it doesn't workaround the current segmentation fault. As OpenMPI documentation says it doesn't turn on memory registration by default and that is why I didn't hit the problem.

harisubramoni · 2019-07-09T14:21:34Z

@ye-luo - can you please try the configuration option I mentioned?

ye-luo · 2019-07-09T14:39:28Z

@harisubramoni I will try when I have free cycles. Here is a minimal reproducer.

#include <stdlib.h>
int main()
{
  float *a = aligned_alloc(64,512);
  float *b = aligned_alloc(64,512000);
  free(a);
  free(b);
}

This is a standards compliant C code.

ye-luo · 2019-07-09T17:09:48Z

@harisubramoni I can confirm that after building MVAPICH with --disable-registration-cache, malloc posix_memalign and free are no more intercepted by MVAPICH and thus the test code passes. However, this solution is not useful in practice because we can't ask users to build their own MPI library and administrator may not want to disable this feature for performance reason. Do you know any macro defined only by the MVAPICH compiler wrapper even without including any MPI header files?

ye-luo · 2020-02-19T09:13:12Z

Our cmake now catches this case.

harisubramoni · 2020-05-08T20:21:08Z

Hi, Ye.

Can you please see if the following patch to MVAPICH2 and see if it fixes your issue? It seems to work for the minimal reproducer you had given. If you can confirm this, we will ensure that this is available with the next release of MVAPICH2.

diff --git a/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c b/src/mpid/ch3/channels/common/src/memory/ptm
index b6627b6..f7fdd73 100644
--- a/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c
+++ b/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c
@@ -1302,6 +1302,14 @@ int      public_sET_STATe();
   POSIX wrapper like memalign(), checking for validity of size.
 */
 int      __posix_memalign(void **, size_t, size_t);
+/*
+  void * aligned_alloc (size_t alignment, size_t size)
+
+  The aligned_alloc function allocates a block of size bytes
+  whose address is a multiple of alignment. The alignment must
+  be a power of two and size must be a multiple of alignment.
+*/
+void *  __aligned_alloc(size_t, size_t);
 #endif

 /* mallopt tuning options */
@@ -5572,10 +5580,50 @@ posix_memalign (void **memptr, size_t alignment, size_t size)
   return ENOMEM;
 }

+/* CHANNEL_MRAIL: We need to expose our own aligned_alloc or the wrong one
+ * will be used.
+#ifdef _LIBC
+*/
+
+void *
+/* <CHANNEL_MRAIL> */
+aligned_alloc (size_t alignment, size_t size)
+/* __aligned_alloc (size_t alignment, size_t size)
+ * </CHANNEL_MRAIL>
+ */
+{
+  void *mem;
+  __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
+                                       __const __malloc_ptr_t)) =
+    __memalign_hook;
+
+  /* Test whether the size and alignment arguments are valid.
+     The alignment must be a power of two and
+     size must be a multiple of alignment. */
+  if (size % alignment != 0
+      || !powerof2 (alignment) != 0
+      || alignment == 0)
+    return EINVAL;
+
+  /* Call the hook here, so that caller is posix_memalign's caller
+     and not posix_memalign itself.  */
+  if (hook != NULL)
+    mem = (*hook)(alignment, size, RETURN_ADDRESS (0));
+  else
+    mem = public_mEMALIGn (alignment, size);
+
+  if (mem != NULL) {
+    return mem;
+  }
+
+  return ENOMEM;
+}
+
 /* <CHANNEL_MRAIL> */
 #if defined(_LIBC)
 /* </CHANNEL_MRAIL> */
 weak_alias (__posix_memalign, posix_memalign)
+weak_alias (__aligned_alloc, aligned_alloc)

 strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc)
 strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree)

Thx,
Hari.

prckent added the bug label Jul 9, 2019

ye-luo mentioned this issue Feb 7, 2020

Adopt more FindMPI and requires CMake 3.10 #2289

Merged

13 tasks

ye-luo added the wontfix label Feb 19, 2020

ye-luo closed this as completed Feb 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory alloction/deallocation is not safe with MVAPICH #1703

Memory alloction/deallocation is not safe with MVAPICH #1703

ye-luo commented Jul 5, 2019 •

edited

Loading

PDoakORNL commented Jul 5, 2019

ye-luo commented Jul 5, 2019 •

edited

Loading

prckent commented Jul 7, 2019 •

edited

Loading

harisubramoni commented Jul 9, 2019

ye-luo commented Jul 9, 2019

harisubramoni commented Jul 9, 2019

ye-luo commented Jul 9, 2019 •

edited

Loading

ye-luo commented Jul 9, 2019

ye-luo commented Feb 19, 2020

harisubramoni commented May 8, 2020

Memory alloction/deallocation is not safe with MVAPICH #1703

Memory alloction/deallocation is not safe with MVAPICH #1703

Comments

ye-luo commented Jul 5, 2019 • edited Loading

PDoakORNL commented Jul 5, 2019

ye-luo commented Jul 5, 2019 • edited Loading

prckent commented Jul 7, 2019 • edited Loading

harisubramoni commented Jul 9, 2019

ye-luo commented Jul 9, 2019

harisubramoni commented Jul 9, 2019

ye-luo commented Jul 9, 2019 • edited Loading

ye-luo commented Jul 9, 2019

ye-luo commented Feb 19, 2020

harisubramoni commented May 8, 2020

ye-luo commented Jul 5, 2019 •

edited

Loading

ye-luo commented Jul 5, 2019 •

edited

Loading

prckent commented Jul 7, 2019 •

edited

Loading

ye-luo commented Jul 9, 2019 •

edited

Loading