Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-fsanitize=alignment: check memcpy/memmove arguments #67766

Closed
wants to merge 1 commit into from

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Sep 29, 2023

The -fsanitize=alignment implementation uses the model that we allow
forming unaligned pointers but disallow accessing unaligned pointers.
RFC: Enforcing pointer type alignment in Clang

memcpy is a memory access and given int * we require it to be aligned. Similar
to https://reviews.llvm.org/D9673 , emit -fsanitize=alignment check for
arguments of builtin memcpy and memmove functions to catch misaligned load like:

// Check a
void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); }

For a reference parameter, we emit a -fsanitize=alignment check as well, which
can be optimized out by InstCombinePass. We rely on the call site
TCK_ReferenceBinding check instead.

// The check of a will be optimized out.
void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); }
runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *'

Technically builtin memset functions can be checked for -fsanitize=alignment as
well, but it does not seem too useful.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Sep 29, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2023

@llvm/pr-subscribers-compiler-rt-sanitizer
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Changes

Similar to https://reviews.llvm.org/D9673, emit -fsanitize=alignment check for
arguments of builtin memcpy and memmove functions to catch misaligned load like:

// Check a
void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); }

For a reference parameter, we emit a -fsanitize=alignment check as well, which
can be optimized out by InstCombinePass. We rely on the call site
TCK_ReferenceBinding check instead.

// The check of a will be optimized out.
void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); }

Technically builtin memset functions can be checked for -fsanitize=alignment as
well, but it does not seem too useful.


Full diff: https://github.com/llvm/llvm-project/pull/67766.diff

3 Files Affected:

  • (modified) clang/include/clang/Basic/Sanitizers.h (+2)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+19-12)
  • (modified) clang/test/CodeGen/catch-undef-behavior.c (+66-2)
diff --git a/clang/include/clang/Basic/Sanitizers.h b/clang/include/clang/Basic/Sanitizers.h
index 4659e45c7883419..16d241cbf809bbe 100644
--- a/clang/include/clang/Basic/Sanitizers.h
+++ b/clang/include/clang/Basic/Sanitizers.h
@@ -170,6 +170,8 @@ struct SanitizerSet {
     Mask = Value ? (Mask | K) : (Mask & ~K);
   }
 
+  void set(SanitizerMask K) { Mask = K; }
+
   /// Disable the sanitizers specified in \p K.
   void clear(SanitizerMask K = SanitizerKind::All) { Mask &= ~K; }
 
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index b0fd38408806566..2102d0aaff9d620 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -2730,6 +2730,19 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     }
   }
 
+  auto EmitArgCheck = [&](TypeCheckKind Kind, Address A, const Expr *Arg,
+                          unsigned ParmNum) {
+    Value *Val = A.getPointer();
+    EmitNonNullArgCheck(RValue::get(Val), Arg->getType(), Arg->getExprLoc(), FD,
+                        ParmNum);
+
+    SanitizerSet SkippedChecks;
+    SkippedChecks.set(SanitizerKind::All);
+    SkippedChecks.clear(SanitizerKind::Alignment);
+    EmitTypeCheck(Kind, Arg->getExprLoc(), Val, Arg->getType(),
+                  A.getAlignment(), SkippedChecks);
+  };
+
   switch (BuiltinIDIfNoAsmLabel) {
   default: break;
   case Builtin::BI__builtin___CFStringMakeConstantString:
@@ -3720,10 +3733,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
     Address Src = EmitPointerWithAlignment(E->getArg(1));
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
-    EmitNonNullArgCheck(RValue::get(Dest.getPointer()), E->getArg(0)->getType(),
-                        E->getArg(0)->getExprLoc(), FD, 0);
-    EmitNonNullArgCheck(RValue::get(Src.getPointer()), E->getArg(1)->getType(),
-                        E->getArg(1)->getExprLoc(), FD, 1);
+    EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
+    EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
     Builder.CreateMemCpy(Dest, Src, SizeVal, false);
     if (BuiltinID == Builtin::BImempcpy ||
         BuiltinID == Builtin::BI__builtin_mempcpy)
@@ -3738,10 +3749,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Address Src = EmitPointerWithAlignment(E->getArg(1));
     uint64_t Size =
         E->getArg(2)->EvaluateKnownConstInt(getContext()).getZExtValue();
-    EmitNonNullArgCheck(RValue::get(Dest.getPointer()), E->getArg(0)->getType(),
-                        E->getArg(0)->getExprLoc(), FD, 0);
-    EmitNonNullArgCheck(RValue::get(Src.getPointer()), E->getArg(1)->getType(),
-                        E->getArg(1)->getExprLoc(), FD, 1);
+    EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
+    EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
     Builder.CreateMemCpyInline(Dest, Src, Size);
     return RValue::get(nullptr);
   }
@@ -3798,10 +3807,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
     Address Src = EmitPointerWithAlignment(E->getArg(1));
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
-    EmitNonNullArgCheck(RValue::get(Dest.getPointer()), E->getArg(0)->getType(),
-                        E->getArg(0)->getExprLoc(), FD, 0);
-    EmitNonNullArgCheck(RValue::get(Src.getPointer()), E->getArg(1)->getType(),
-                        E->getArg(1)->getExprLoc(), FD, 1);
+    EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
+    EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
     Builder.CreateMemMove(Dest, Src, SizeVal, false);
     return RValue::get(Dest.getPointer());
   }
diff --git a/clang/test/CodeGen/catch-undef-behavior.c b/clang/test/CodeGen/catch-undef-behavior.c
index ca0df0f002f8912..3956a475f319ea9 100644
--- a/clang/test/CodeGen/catch-undef-behavior.c
+++ b/clang/test/CodeGen/catch-undef-behavior.c
@@ -354,21 +354,63 @@ void call_decl_nonnull(int *a) {
   decl_nonnull(a);
 }
 
-extern void *memcpy (void *, const void *, unsigned) __attribute__((nonnull(1, 2)));
+extern void *memcpy(void *, const void *, unsigned long) __attribute__((nonnull(1, 2)));
 
 // CHECK-COMMON-LABEL: @call_memcpy_nonnull
 void call_memcpy_nonnull(void *p, void *q, int sz) {
   // CHECK-COMMON: icmp ne ptr {{.*}}, null
   // CHECK-UBSAN: call void @__ubsan_handle_nonnull_arg
   // CHECK-TRAP: call void @llvm.ubsantrap(i8 16)
+  // CHECK-COMMON-NOT: call
 
   // CHECK-COMMON: icmp ne ptr {{.*}}, null
   // CHECK-UBSAN: call void @__ubsan_handle_nonnull_arg
   // CHECK-TRAP: call void @llvm.ubsantrap(i8 16)
+  // CHECK-COMMON-NOT: call
+
+  // CHECK-COMMON: call void @llvm.memcpy.p0.p0.i64(ptr align 1 %0, ptr align 1 %1, i64 %conv, i1 false)
   memcpy(p, q, sz);
 }
 
-extern void *memmove (void *, const void *, unsigned) __attribute__((nonnull(1, 2)));
+// CHECK-COMMON-LABEL: define{{.*}} void @call_memcpy(
+void call_memcpy(long *p, short *q, int sz) {
+  // CHECK-COMMON: icmp ne ptr {{.*}}, null
+  // CHECK-UBSAN: call void @__ubsan_handle_nonnull_arg(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 16)
+  // CHECK-COMMON: and i64 %[[#]], 7, !nosanitize
+  // CHECK-COMMON: icmp eq i64 %[[#]], 0, !nosanitize
+  // CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 22)
+
+  // CHECK-COMMON: icmp ne ptr {{.*}}, null
+  // CHECK-UBSAN: call void @__ubsan_handle_nonnull_arg(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 16)
+  // CHECK-COMMON: and i64 %[[#]], 1, !nosanitize
+  // CHECK-COMMON: icmp eq i64 %[[#]], 0, !nosanitize
+  // CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 22)
+
+  // CHECK-COMMON: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 2 %1, i64 %conv, i1 false)
+  memcpy(p, q, sz);
+}
+
+// CHECK-COMMON-LABEL: define{{.*}} void @call_memcpy_inline(
+void call_memcpy_inline(long *p, short *q) {
+  // CHECK-COMMON: and i64 %[[#]], 7, !nosanitize
+  // CHECK-COMMON: icmp eq i64 %[[#]], 0, !nosanitize
+  // CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 22)
+
+  // CHECK-COMMON: and i64 %[[#]], 1, !nosanitize
+  // CHECK-COMMON: icmp eq i64 %[[#]], 0, !nosanitize
+  // CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 22)
+
+  // CHECK-COMMON: call void @llvm.memcpy.inline.p0.p0.i64(ptr align 8 %0, ptr align 2 %1, i64 2, i1 false)
+  __builtin_memcpy_inline(p, q, 2);
+}
+
+extern void *memmove(void *, const void *, unsigned long) __attribute__((nonnull(1, 2)));
 
 // CHECK-COMMON-LABEL: @call_memmove_nonnull
 void call_memmove_nonnull(void *p, void *q, int sz) {
@@ -382,6 +424,28 @@ void call_memmove_nonnull(void *p, void *q, int sz) {
   memmove(p, q, sz);
 }
 
+// CHECK-COMMON-LABEL: define{{.*}} void @call_memmove(
+void call_memmove(long *p, short *q, int sz) {
+  // CHECK-COMMON: icmp ne ptr {{.*}}, null
+  // CHECK-UBSAN: call void @__ubsan_handle_nonnull_arg(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 16)
+  // CHECK-COMMON: and i64 %[[#]], 7, !nosanitize
+  // CHECK-COMMON: icmp eq i64 %[[#]], 0, !nosanitize
+  // CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 22)
+
+  // CHECK-COMMON: icmp ne ptr {{.*}}, null
+  // CHECK-UBSAN: call void @__ubsan_handle_nonnull_arg(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 16)
+  // CHECK-COMMON: and i64 %[[#]], 1, !nosanitize
+  // CHECK-COMMON: icmp eq i64 %[[#]], 0, !nosanitize
+  // CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1(
+  // CHECK-TRAP: call void @llvm.ubsantrap(i8 22)
+
+  // CHECK-COMMON: call void @llvm.memmove.p0.p0.i64(ptr align 8 %0, ptr align 2 %1, i64 %conv, i1 false)
+  memmove(p, q, sz);
+}
+
 // CHECK-COMMON-LABEL: @call_nonnull_variadic
 __attribute__((nonnull)) void nonnull_variadic(int a, ...);
 void call_nonnull_variadic(int a, int *b) {

@@ -47,6 +50,16 @@ int main(int, char **argv) {
return *p && 0;
// CHECK-STACK-LOAD: #0 {{.*}}main{{.*}}misaligned.cpp

case 'L': {
int x;
// CHECK-MEMCPY-LOAD: misaligned.cpp:[[#@LINE+4]]{{(:16)?}}: runtime error: load of misaligned address [[PTR:0x[0-9a-f]*]] for type 'const void *', which requires 4 byte alignment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'const void *' is wrong here :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not wrong, but unhelpful.
Issue is that 'int*' was misaligned, not that we pass it into void*.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the issue was mentioned in the description

The error message refers to void * instead of int *, which is not perfect
but should be acceptable.

runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'void *'

Anyhow, BI__builtin_HEXAGON_V6_vmaskedstoreq strips an implicit cast. I added a similar mechanism in 9b705127b17513ee17e76fb903e85500c832d8aa

@MaskRay MaskRay force-pushed the ubsan/alignment-memcpy branch 2 times, most recently from bba6a45 to 9b70512 Compare September 29, 2023 20:10
@nikic
Copy link
Contributor

nikic commented Sep 29, 2023

Uh, why are we allowed to assume that memcpy pointer arguments are aligned? This looks like a miscompile to me.

A plain int * pointer is not required to be aligned, and memcpy works on void * pointers, so I'm not sure where an alignment requirement would appear from.

@vitalybuka
Copy link
Collaborator

LGTM, but you probably want @rjmccall or @zygoloid review.

@zygoloid
Copy link
Collaborator

Uh, why are we allowed to assume that memcpy pointer arguments are aligned? This looks like a miscompile to me.

This is definitely a bit weird, but...

A plain int * pointer is not required to be aligned, and memcpy works on void * pointers, so I'm not sure where an alignment requirement would appear from.

... a plain int * pointer is sort of required to be aligned. Per https://eel.is/c++draft/expr.static.cast#14.sentence-2 (and similar wording exists in C), you get an unspecified pointer value if you cast an unaligned pointer to int *, which is allowed to be an invalid pointer value, which we are then permitted to give whatever semantics we like, such as disallowing it being passed to memcpy.

Clang generally chooses to define the semantics of misaligned pointer values as doing the "obvious" thing (@rjmccall circulated an RFC on this a few years ago). In short: we allow them to be formed, and we allow pointer arithmetic on them, but don't allow them to be used to actually access misaligned memory. But memcpy is a memory access, so if you directly pass an int * to it, I think we're following our rules if we require it to be aligned. And it seems like a valuable thing to assume: in the vast majority of cases, it is reasonable to assume T's alignment when a T* is passed to memcpy.

I think the choice we're making here is probably worth it, though we should probably document it better. I think you can remove the alignment assumption by explicitly casting the operands to char* before passing them to memcpy; if you can't, I'd be more worried that we're doing something problematic here. Also, it'd seem like a good idea to make the sanitizer message as clear as possible for this case, because Clang's behavior here is surprising.

@MaskRay
Copy link
Member Author

MaskRay commented Oct 1, 2023

Thanks for the comment.

I think the choice we're making here is probably worth it, though we should probably document it better. I think you can remove the alignment assumption by explicitly casting the operands to char* before passing them to memcpy; if you can't, I'd be more worried that we're doing something problematic here.

Yes. The correct implementation correctly drops the alignment check on the dst parameter for memcpy((char *)a, b, sz);

Also, it'd seem like a good idea to make the sanitizer message as clear as possible for this case, because Clang's behavior here is surprising.

@zygoloid
Is reusing the message for regular stores clear (current behavior) enough?

// CHECK-MEMCPY-STORE: misaligned.cpp:[[#@LINE+4]]{{(:12)?}}: runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *', which requires 4 byte alignment

@zygoloid
Copy link
Collaborator

zygoloid commented Oct 1, 2023

@zygoloid Is reusing the message for regular stores clear (current behavior) enough?

// CHECK-MEMCPY-STORE: misaligned.cpp:[[#@LINE+4]]{{(:12)?}}: runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *', which requires 4 byte alignment

I predict we'll get a bug report saying "I didn't do a store, I used memcpy, which is specified as doing a byte-by-byte copy". But given that improving this would require adding a new entry point into the runtime library, and we don't know how common or rare this situation will be, I think it's probably OK to wait until someone complains.

@MaskRay
Copy link
Member Author

MaskRay commented Oct 2, 2023

@zygoloid Is reusing the message for regular stores clear (current behavior) enough?

// CHECK-MEMCPY-STORE: misaligned.cpp:[[#@LINE+4]]{{(:12)?}}: runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *', which requires 4 byte alignment

I predict we'll get a bug report saying "I didn't do a store, I used memcpy, which is specified as doing a byte-by-byte copy". But given that improving this would require adding a new entry point into the runtime library, and we don't know how common or rare this situation will be, I think it's probably OK to wait until someone complains.

Thanks. Let's wait until someone complains :)

@nikic
Copy link
Contributor

nikic commented Oct 2, 2023

@zygoloid Thanks for the explanation! I wasn't aware this fell under unspecified behavior. It's weird that alignment information can survive a void * cast, but it does make some sense.

What seems worrying here is that apparently GCC and Clang do not agree on semantics in this case (https://godbolt.org/z/1r9df5a4a). GCC does not assume that the pointers are aligned. The perils of unspecified behavior....

@MaskRay
Copy link
Member Author

MaskRay commented Oct 8, 2023

@zygoloid Thanks for the explanation! I wasn't aware this fell under unspecified behavior. It's weird that alignment information can survive a void * cast, but it does make some sense.

Yes, (void *)x decreases the alignment requirement to 1 byte like (char *)x.

What seems worrying here is that apparently GCC and Clang do not agree on semantics in this case (godbolt.org/z/1r9df5a4a). GCC does not assume that the pointers are aligned. The perils of unspecified behavior....

Yes...

I am out of town and will merge this PR later.

@MaskRay MaskRay force-pushed the ubsan/alignment-memcpy branch from 9b70512 to c5d2254 Compare October 10, 2023 04:04
The -fsanitize=alignment implementation follows the model that we allow
forming unaligned pointers but disallow accessing unaligned pointers.
See [RFC: Enforcing pointer type alignment in Clang](https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html)
for detail.

memcpy is a memory access and we require an `int *` argument to be aligned.
Similar to https://reviews.llvm.org/D9673 , emit -fsanitize=alignment check for
arguments of builtin memcpy and memmove functions to catch misaligned load like:
```
// Check the alignment of a but ignore the alignment of b
void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); }
```

For a reference parameter, we emit a -fsanitize=alignment check as well, which
can be optimized out by InstCombinePass. We rely on the call site
`TCK_ReferenceBinding` check instead.
```
// The alignment check of a will be optimized out.
void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); }
```

The diagnostic message looks like
```
runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *'
```

We could use a better message for memcpy, but we don't do it for now as it would
require a new check name like misaligned-pointer-use, which is probably not
necessary. *RFC: Enforcing pointer type alignment in Clang* is not well documented,
but this patch does not intend to change the that.

Technically builtin memset functions can be checked for -fsanitize=alignment as
well, but it does not seem too useful.
@MaskRay MaskRay force-pushed the ubsan/alignment-memcpy branch from c5d2254 to ca81007 Compare October 10, 2023 05:57
@MaskRay
Copy link
Member Author

MaskRay commented Oct 10, 2023

Add the test to clang/test/CodeGen/catch-undef-behavior.c ca81007

+  /// Casting to void * or char * drops the alignment requirement.
+  memcpy((void *)p, (char *)q, sz);

MaskRay added a commit that referenced this pull request Oct 10, 2023
The -fsanitize=alignment implementation follows the model that we allow
forming unaligned pointers but disallow accessing unaligned pointers.
See [RFC: Enforcing pointer type alignment in Clang](https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html)
for detail.

memcpy is a memory access and we require an `int *` argument to be aligned.
Similar to https://reviews.llvm.org/D9673 , emit -fsanitize=alignment check for
arguments of builtin memcpy and memmove functions to catch misaligned load like:
```
// Check the alignment of a but ignore the alignment of b
void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); }
```

For a reference parameter, we emit a -fsanitize=alignment check as well, which
can be optimized out by InstCombinePass. We rely on the call site
`TCK_ReferenceBinding` check instead.
```
// The alignment check of a will be optimized out.
void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); }
```

The diagnostic message looks like
```
runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *'
```

We could use a better message for memcpy, but we don't do it for now as it would
require a new check name like misaligned-pointer-use, which is probably not
necessary. *RFC: Enforcing pointer type alignment in Clang* is not well documented,
but this patch does not intend to change the that.

Technically builtin memset functions can be checked for -fsanitize=alignment as
well, but it does not seem too useful.
@MaskRay MaskRay closed this Oct 10, 2023
@MaskRay MaskRay deleted the ubsan/alignment-memcpy branch October 10, 2023 06:02
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Oct 16, 2023
…n to disable memcpy instrumentation

Deploying llvm#67766 to a large internal codebase uncovers many bugs (many are
probably benign but need cleaning up). There are also issues in high-profile
open-source projects like v8. Add a cl::opt to disable builtin instrumentation
for -fsanitize=alignment to help large codebase users.

In the long term, this cl::opt option may still be useful to debug
-fsanitize=alignment instrumentation on builtins, so we probably want to keep it
around.
MaskRay added a commit that referenced this pull request Oct 18, 2023
…n to disable memcpy instrumentation (#69240)

Deploying #67766 to a large internal codebase uncovers many bugs (many
are
probably benign but need cleaning up). There are also issues in
high-profile
open-source projects like v8. Add a cl::opt to disable builtin
instrumentation
for -fsanitize=alignment to help large codebase users.

In the long term, this cl::opt option may still be useful to debug
-fsanitize=alignment instrumentation on builtins, so we probably want to
keep it around.
copybara-service bot pushed a commit to google/highwayhash that referenced this pull request Nov 2, 2023
See
llvm/llvm-project#67766 (comment)
and https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html .
An unaligned pointer has an unspecified value per C++ expr.static.cast,
in which the compiler is permitted to disallow it being passed to
memcpy. Clang -fsanitize=alignment recently becomes stricter (memcpy is
now instrumented) in part to complement AddressSanitizer detection.

There are quite a few LoadUnaligned overloads in the code base that
tickles this unspecified behavior, but they are benign for Clang.

PiperOrigin-RevId: 578755824
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 2, 2023
See llvm/llvm-project#67766 (comment) and
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html . An
unaligned pointer has an unspecified value per C++ expr.static.cast, in which
the compiler is permitted to disallow it being passed to memcpy. Clang
-fsanitize=alignment recently becomes stricter (memcpy is now instrumented) in
part to complement AddressSanitizer detection.
PiperOrigin-RevId: 579015567
hawkinsp added a commit to hawkinsp/numpy that referenced this pull request Nov 13, 2023
…es.c.src

OBJECT_nonzero may be called with misaligned pointers, manifesting as a
-fsanitize=alignment failure. This is UB per C11 6.3.2.3

> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned) for the referenced type, the behavior is undefined.

Nevertheless, Clang only checks alignment when the unaligned pointer is accessed.
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html
Call memcpy with unaligned arguments instead to work with new Clang
(llvm/llvm-project#67766).
charris pushed a commit to charris/numpy that referenced this pull request Nov 20, 2023
…es.c.src

OBJECT_nonzero may be called with misaligned pointers, manifesting as a
-fsanitize=alignment failure. This is UB per C11 6.3.2.3

> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned) for the referenced type, the behavior is undefined.

Nevertheless, Clang only checks alignment when the unaligned pointer is accessed.
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html
Call memcpy with unaligned arguments instead to work with new Clang
(llvm/llvm-project#67766).
@pitrou
Copy link
Member

pitrou commented Oct 11, 2024

Judging by the list of notifications at the bottom of this PR, this does seem to have hit a number of projects (including our own).

pitrou added a commit to pitrou/arrow that referenced this pull request Oct 18, 2024
…ng-18

CLang 18 complains about undefined behavior when memcpy is called on a T-unaligned `T*` pointer.
Workaround by casting to `void*`.

This is due to this upstream change: llvm/llvm-project#67766
pitrou added a commit to apache/arrow that referenced this pull request Oct 18, 2024
…44468)

### Rationale for this change

CLang 18 complains about undefined behavior when memcpy is called on a T-unaligned `T*` pointer.
This is due to this upstream change: llvm/llvm-project#67766

### What changes are included in this PR?

Workaround by casting to `void*`.

### Are these changes tested?

By existing CI tests.

### Are there any user-facing changes?

No.

* GitHub Issue: #44372

Lead-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt:sanitizer compiler-rt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants