Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-fsanitize=alignment false positive with intended unaligned struct member access #83710

Closed
akihikodaki opened this issue Mar 3, 2024 · 11 comments
Labels
compiler-rt:ubsan Undefined behavior sanitizer false-positive Warning fires when it should not question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@akihikodaki
Copy link

-fsanitize=alignment generates a false positive error for an intended unaligned struct member access. The intention of unaligned struct member access is expressed with __builtin_memcpy() as done by QEMU or packed struct access as done by Linux. GCC translates such a construct to code to access memory unaligned for architectures like rv64gc as intended but also emits code to enforce the alignment.

The relevant code of QEMU is at: https://gitlab.com/qemu-project/qemu/-/blob/v8.2.1/include/qemu/bswap.h?ref_type=tags
The relevant code of Linux is at: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/unaligned.h?h=v6.7

FYI, this issue is reproducible also with 13.2.0, and the relevant ticket is available at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217

To reproduce the issue, compile the code shown below with -O2 -fsanitize=alignment for rv64gc:

#include <stdint.h>

typedef uint64_t u64;

/* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler_attributes.h?h=v6.7 */

/*
 *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-packed-type-attribute
 * clang: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-packed-variable-attribute
 */
#define __packed                        __attribute__((__packed__))

/* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/unaligned.h?h=v6.7 */

#define __get_unaligned_t(type, ptr) ({						\
	const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);	\
	__pptr->x;								\
})

#define __put_unaligned_t(type, val, ptr) do {					\
	struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);		\
	__pptr->x = (val);							\
} while (0)

#define get_unaligned(ptr)	__get_unaligned_t(typeof(*(ptr)), (ptr))
#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))

/* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/inode.c?h=v6.7 */

struct dir_entry {
	u64 ino;
	u64 offset;
	unsigned type;
	int name_len;
};

/*
 * This function is intended to perform an unaligned access.
 * GCC emits code for an unaligned operation as intended,
 * but also emits code to assert alignment.
 */
u64 f(struct dir_entry *entry)
{
    return get_unaligned(&entry->offset);
}

/*
 * This function is intended to perform an aligned access.
 * GCC emits code for an aligned operation,
 * and emits code to assert alignment.
 */
u64 g(struct dir_entry *entry)
{
    return entry->offset;
}
@Sirraide Sirraide added compiler-rt:ubsan Undefined behavior sanitizer and removed new issue labels Mar 3, 2024
@EugeneZelenko EugeneZelenko added the false-positive Warning fires when it should not label Mar 3, 2024
@jakubjelinek
Copy link

IMNSHO not a false positive. In C/C++, UB would be already in the caller of f if entry isn't properly aligned, but neither LLVM nor GCC diagnose that in UBSAN. What is diagnosed is &entry->offset which implies that entry must be properly aligned. Use char * pointer arithmetics with offsetof instead to avoid UBSAN diagnosing it, and even better pass a void * or char * pointer instruct of struct dir_entry and use offsetof and __get_unaligned_t perhaps inside of some macro.

@akihikodaki
Copy link
Author

Thanks for looking into this.

It is somewhat misleading to call this a "false positive" as it is certainly UB in the specification. However, #67766 explicitly allows misaligned memory access when no struct member is involved. I see no reason to change the behavior only when struct member is involved. For example, consider the following functions:

u64 f3(u64 *offset)
{
    u64 buffer;
    __builtin_memcpy(&buffer, offset, sizeof(buffer));
    return buffer;
}

u64 f4(u64 *offset)
{
    u64 buffer;
    __builtin_memcpy(&buffer, (void *)offset, sizeof(buffer));
    return buffer;
}

u64 f5(struct dir_entry *entry)
{
    u64 buffer;
    __builtin_memcpy(&buffer, (void *)&entry->offset, sizeof(buffer));
    return buffer;
}

There are already UBs if the callers of these functions create arguments of these functions by casting misaligned addresses to the respective pointers. In that sense, UBSan is allowed to generate errors for all of them, and an optimization pass can freely translate the __builtin_memcpy calls into aligned accesses. In fact, __builtin_memcpy in f3 gets translated to one ld instruction if it's compiled for rv64gc with -O2.

Nevertheless, #67766 still tolerates f4 as it explicitly casts offset to void *, and the relevant __builtin_memcpy gets translated into unaligned memory access. The __builtin_memcpy call in f5 is also translated into unaligned memory access, but somehow UBSan still complains about misaligned memory access. This looks inconsistent.

I described my opinion regarding offsetof in GCC Bugzilla so I cite it here for record:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217#c8

If you want something that will be valid even in C, don't pass struct
dir_entry *entry
argument, but void *entry instead, and use e.g.
__get_unaligned_t(__typeof(((struct dir_entry *)0)->offset), ((char
*)entry)+offsetof(struct dir_entry, offset)))
You can surely hide that all under some macro.

The definition still involves UB for ((struct dir_entry *)0)->offset. Perhaps __typeof() may be considered as an exception, > but what if offsetof() is defined as follows?

#define offsetof(T, x) ((uintptr_t)&(((T *)0)->x))

GCC does provide __builtin_offsetof(), but I think definitions of offsetof() like this are still prevalent, and expected to work although it's UB. If GCC tolerates this kind of trick, why not tolerate get_unaligned(&entry->offset)?

@MaskRay
Copy link
Member

MaskRay commented Mar 4, 2024

Per discussions on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217, Clang's behavior is expected.

So there are two constructs invoking UBs but ignored by UBSan: 1) construction of misaligned pointers that do not involve a struct member, and 2) pointer arithmetic with a member of an invalid struct pointer as done by old offsetof().

https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html has some background of Clang's behavior, which appears to match GCC's -fsanitize=alignment strategy as well. There are certainly rough edges (e.g. memcpy instrumentation before my #69240), but this issue isn't such a case.

@MaskRay MaskRay closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Mar 4, 2024
@akihikodaki
Copy link
Author

Per discussions on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217, Clang's behavior is expected.

So there are two constructs invoking UBs but ignored by UBSan: 1) construction of misaligned pointers that do not involve a struct member, and 2) pointer arithmetic with a member of an invalid struct pointer as done by old offsetof().

https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html has some background of Clang's behavior, which appears to match GCC's -fsanitize=alignment strategy as well. There are certainly rough edges (e.g. memcpy instrumentation before my #69240), but this issue isn't such a case.

@MaskRay

Thanks for sharing the RFC; it is very informative.

The RFC says:

Note that this definition is syntactic, meaning that it is expressed in terms of the components of a single statement. This means that an access that might be undefined behavior if written as a single statement:
highlyAlignedStruct->charMember = 0;
may not be undefined behavior if split across two statements:
“char *member = &highlyAlignedStruct->charMember;
*member = 0;
In effect, the compiler promises to never propagate alignment assumptions
between statements through its knowledge of how a pointer was constructed.
This is necessary in order to allow local workarounds to be reliable.

Does f5 in an earlier comment still constitute UB?

u64 f5(struct dir_entry *entry)
{
    u64 buffer;
    __builtin_memcpy(&buffer, (void *)&entry->offset, sizeof(buffer));
    return buffer;
}

@akihikodaki
Copy link
Author

Perhaps the following function is easier to understand:

u64 f6(struct dir_entry *entry)
{
    u64 buffer;
    void *p = (void *)&entry->offset;
    __builtin_memcpy(&buffer, p, sizeof(buffer));
    return buffer;
}

This should not be UB according to the RFC even if entry is misaligned, but UBSan still complains.

@akihikodaki
Copy link
Author

akihikodaki commented Mar 20, 2024

@rjmccall Hi, I'd like to ask your opinion since you are the author of the RFC: #83710 (comment)

@rjmccall
Copy link
Contributor

That's correct; according to our interpretation in the RFC, both of those should be valid (unless somehow buffer is misaligned, but that should be impossible). The general implication for UBSan is that — unless you want to have a "strict mode" that goes beyond our interpretation — alignment enforcement needs to be tied to specific accesses rather than abstract pointer constructions.

@akihikodaki
Copy link
Author

Can you reopen this issue? I'm apparently not allowed to reopen it.

@akihikodaki
Copy link
Author

akihikodaki commented Mar 29, 2024

@MaskRay Hi, can you check the above discussion and reopen this issue if it looks reasonable to you?

@akihikodaki
Copy link
Author

@MaskRay (cc: @rjmccall) Hi, please check out the last comment.

@akihikodaki
Copy link
Author

@MaskRay (cc: @rjmccall) Please respond; it's already been 3 months since we got an interpretation of the relevant RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt:ubsan Undefined behavior sanitizer false-positive Warning fires when it should not question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

6 participants