From 2a23c6803e5a535aee1e618d58dafb03abcd1156 Mon Sep 17 00:00:00 2001 From: Geunsik Lim Date: Thu, 7 Jul 2016 18:18:38 +0900 Subject: [PATCH] Fix bus errors with memcpy to handle unaligned access on O2/O3 This patch is to resolve bus errors in case of the O3 of clang (issue # 5 8 4 4). When we enable the -O2/-O3 optimization levels of the clang language (from clang 3.5 to latest version that was released on Jun-13-2016), we have got the +3000 BUS Errors from the coreCLR's unit tests. We can easily monitor SIGBUS signals (e.g., "misaligned memory access") with /proc/cpu/alignment facility of kernel-space. Using "echo 2 > /proc/cpu/alignment" makes Linux kernel fixes the problems but the performance of the application will be degraded. * source: http://lxr.free-electrons.com/source/Documentation/arm/mem_alignment According to ARM information center(infocenter.arm.com), By default, the ARM compiler expects normal C and C++ pointers to point to an aligned word in memory. A type qualifier __packed is provided to enable unaligned pointer access. If you want to define a pointer to a word that can be at any address (that is, that can be at a non-natural alignment), you must specify this using the __packed qualifier when defining the pointer: __packed int *pi; // pointer to unaligned int However, clang/llvm does not support the __packed qualifier such as __attribute__((packed)) or __attribute__((packed, aligned(4))) In -O0 (debugging) the innermost block is emitted as the following assembly, which works properly: ldr r1, [r0, #24] ldr r2, [r0, #20] In -O2 (release) however the compiler realizes these fields are adjacent and generates this assembly: ldrdeq r2, r3, [r0, #20] Unfortunately ldrdb instruction always generates an alignment fault (in practice). It seems that clang uses ldrb instructions although Gcc uses ldr because armv7 supports unalign ldr instruction. Basically, RISC-based ARM architecture requires aligned access with 4byte reads. So, let's use memcpy(2) in into a properly aligned buffer instead of the packing attribute. Note: If architecture (e.g., Linux/ARM Emulator) does not support unaligned ldr, this issue will be not generated with -O2/-O3 optimization levels. * Case study: How does the ARM Compiler support unaligned accesses? http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html * Case study: Indicating unaligned access to Clang for ARM compatibility http://stackoverflow.com/questions/9185811/indicating-unaligned-access-to-clang-for-arm-compatibility * Case study: Chromium source for UnalignedLoad32() on ARM https://github.com/nwjs/chromium.src/blob/nw15/third_party/cld/base/basictypes.h#L302 Signed-off-by: Geunsik Lim --- src/jit/compiler.hpp | 47 +++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-) diff --git a/src/jit/compiler.hpp b/src/jit/compiler.hpp index 6a45f9cb0ed8..4637bdd51aae 100644 --- a/src/jit/compiler.hpp +++ b/src/jit/compiler.hpp @@ -736,34 +736,59 @@ inline unsigned genGetU4(const BYTE *addr) /*****************************************************************************/ // Helpers to pull little-endian values out of a byte stream. - +// Get Unaligned values from a potentially unaligned object inline unsigned __int8 getU1LittleEndian(const BYTE * ptr) -{ return *(UNALIGNED unsigned __int8 *)ptr; } +{ + unsigned __int8 temp; + memcpy(&temp, ptr, sizeof(temp)); + return temp; +} inline unsigned __int16 getU2LittleEndian(const BYTE * ptr) -{ return *(UNALIGNED unsigned __int16 *)ptr; } +{ + unsigned __int16 temp; + memcpy(&temp, ptr, sizeof(temp)); + return temp; +} inline unsigned __int32 getU4LittleEndian(const BYTE * ptr) -{ return *(UNALIGNED unsigned __int32*)ptr; } - +{ + unsigned __int32 temp; + memcpy(&temp, ptr, sizeof(temp)); + return temp; +} inline signed __int8 getI1LittleEndian(const BYTE * ptr) -{ return * (UNALIGNED signed __int8 *)ptr; } - +{ + signed __int8 temp; + memcpy(&temp, ptr, sizeof(temp)); + return temp; +} inline signed __int16 getI2LittleEndian(const BYTE * ptr) -{ return * (UNALIGNED signed __int16 *)ptr; } - +{ + signed __int16 temp; + memcpy(&temp, ptr, sizeof(temp)); + return temp; +} inline signed __int32 getI4LittleEndian(const BYTE * ptr) -{ return *(UNALIGNED signed __int32*)ptr; } +{ + signed __int32 temp; + memcpy(&temp, ptr, sizeof(temp)); + return temp; +} inline signed __int64 getI8LittleEndian(const BYTE * ptr) -{ return *(UNALIGNED signed __int64*)ptr; } +{ + signed __int64 temp; + memcpy(&temp, ptr, sizeof(temp)); + return temp; +} inline float getR4LittleEndian(const BYTE * ptr)