-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang adds "noundef" annotation to char arguments #56551
Comments
@llvm/issue-subscribers-clang-codegen |
For C++ the justification is given by https://eel.is/c++draft/basic.indet. It's very clear about anything but At least as far as C++ is concerned, the indeterminate value UB doesn't have anything to do with trap representations. |
Oh I see, thanks for pointing that out. I have no idea how that can be true at the same time as the statements in That would mean unsigned char / byte loads in C++ cannot be made C doesn't have a clear enumeration like that to my knowledge, so it is less clear there. Documents like this make me think loading uninit data into a char is allowed (i.e., at least as much as C++ allows), and then I would usually think of course I can also pass them to another function, but clearly C++ doesn't satisfy that expectation... |
My LLC/C++ is rusty, but I think that you're confusing indeterminate values and trap representations: a trap representation in C is an invalid but defined bit-pattern. C explicitly permits implementations to choose an integer representation such that you can have a trap representation that differs from a calid representation only by padding. C++ explicitly prohibits this, so you can safely write arbitrary bits to integer padding. |
Those two notions are related though -- given the fuzziness of the standard, "type has no trap representation" is a hint indicating that it has no "strange" (indeterminate) values. I didn't make this up, there are C committee discussions about indeterminate values that end up also discussing trap representations: https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm, https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm. |
If I understand correctly, it's because the questions being asked about indeterminate values are ill-formed with respect to types with trap representations, because any read of an indeterminate value is UB for them (the compiler can just pick the result of the read to be the trap representation, and reading a trap representation is UB). |
The question remains whether there's any way for the program above to have UB in C -- either in its form as given, or in the variant where
IOW, reading a non-value representation with an lvalue expression that has character type, like my example does, is well-defined. Therefore I think clang's |
FTR, in C also,
So an indeterminate value is just straight up an unspecified value for |
Hi, I wonder whether this story also implies that padding bits should not be If this reasoning is right, we can think about an alternate solution as well: initializing padding bits with zero, and keeping the |
I think that would still be wrong, since one should be able to copy uninitialized memory with |
In the general case, I think the arguments slightly diverge in C and C++, which is what this thread has been discussing as well. For C:
char c = *(char*)p; // p is, say, int32_t* storing an indeterminate value
// If the above statement succeeded, c must be unspecified value, because char type doesn't have trap representation. Note that if
Having However, another problem with this C standard semantics understanding is that supporting elimination of a redundant store-load operation using char buffer becomes hard:
This round-trip pair cannot be removed because after the removal the byte at For C++:
|
I am only talking about C in the following.
I think that is a reasonable interpretation of the standard, but unfortunately, the committee decided to take a much more creative stance in this defect report (and even more unfortunately, the standard never got clarified to make this explicit):
We are left with more questions than before. In particular, given all the effort the standard makes to give The interpretation currently taken by clang however makes it entirely impossible to implement tl;dr what clang is doing here is probably not backed by the C standard (but the standard is too ambiguous to be sure), and certainly going against programmer expectations. |
Thanks, then I must slightly fix my previous phrase below
to embrace more elastic interpretation by removing 'conceptually like poison'. I think my previous sentence 'doing any operation should not raise undefined behavior unless the operation was supposed to do so on a 'plain' (non-unspecified, non-trap) value.' from this comment is still valid. I understood that a byte-copying code using
(1) Padding bits in LLVM are not undef anymore and become more defined bits such as '0' (which can affect generated code quality, I believe). Otherwise copying a struct with padding bits using (2) the meaning of 'reading a memory location storing trap representation of non-char type as a char type' became clear, and if it means it successfully reads the location, it should immediately freeze the value because char value should always be well-defined (I used the term 'well-defined' here as a concept including unspecified values, to be clear.) (an example is in my comment). This depends on whether trap representation should always be some concrete bit pattern, or it can be a conceptual value like 'poison'. I think the interpretation of LLVM is more like latter, to facilitate optimizations. Then, would the next action that should be taken now be removing |
I think the way clang translates the following C code to LLVM IR is incorrect:
The resulting IR defines
id
as@id(i8 noundef signext %0)
. Thenoundef
is what I am concerned by. This translation means calling my_memcpy as follows leads to UB, since some of the bytes being copied here are undef or poison (namely, they are padding):If I understand the C standard correctly, this program (running
testcase
) is entirely well-defined. In particular, C17 6.2.6.1 §5 says (emphasis mine)But we are using a character type here. The standard also explicitly says in 6.2.6.2 that
char
types have no padding bits. So I don't think there is any room here for UB to arise when copying arbitrary data (including uninitialized memory) atchar
type. Therefore clang should not addnoundef
to character type variables.Furthermore, I am not entirely sure what the status of this proposal is, but if it has been accepted, then I am not sure that adding
noundef
to any other integer type is correct, either. That proposal states explicitlyAnd at least for C++, https://eel.is/c++draft/basic.fundamental#4 has a note on padding in integer types stating
So, at least for C++, I cannot see a justification for why clang adds
noundef
to all integer types. For non-character integer types in C, the standard is not clear enough for me to be sure either way.Cc @aqjune @nunoplopes
The text was updated successfully, but these errors were encountered: