-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
align_offset guarantees #62420
Comments
Turns out that fixing #61339 to work fine with Miri does not even change the generated assembly; looks like LLVM can just optimize away the extra check for That takes the time pressure of this discussion, that PR can land without having to reach any conclusion here. The discussion still seems to be worth having though! |
Another reason we might want to keep the spec as-is is to be able to eventually make |
This essentially checks that it is indeed possible to align Or, alternatively, that a solution for the |
What is I see. If your pointer points to an odd address and you can only move even-sized steps, you cannot ever get an even address. |
Its the |
The changes made to the documentation of align_offset, and in particular the present stance that
is not very pragmatic. The need to get an aligned reference inside a Vec is a very mundane and common one. Eg: bumpalo. The way that bumpalo seems to be working around this limitation is just to cast to a usize, do the rounding there, and then cast back to a pointer. See: try_alloc_layout_fast. Is round tripping a pointer through a usize really what we want to say is the best practice? That seems to me kind of bizarre. If some operation like aligning a *u8 is fundamentally unsound, why would converting to usize, aligning, and converting back be any more sound? Does altering code that would normally deal with *u8 so that it deals with usize instead lose many of the benefits that Miri is meant to provide? Perhaps code dealing with *u8 does have a soundness problem which Miri would catch, but then the code is altered to use usize instead and Miri can no longer catch the problem. |
Quick remark (will come back to the rest of your comment when I have more time):
Note that this is not a change. Already the very first stable doc for
Guaranteeing more than that would be a change. |
It is not fundamentally unsound. Where are you getting that idea from? The issue above explains in great detail what the issue is with guaranteeing more for
I do not understand the question. What bumpalo does will indeed not work properly in Miri right now, and when Miri gains support for such code it will come at the expense of test coverage. For code like bumpalo that really needs to successfully get a higher-aligned pointer out of a lower-aligned allocation, that's fine, there is nothing else we can do -- but most code using |
Hello, I'm using |
@tdelabro what exactly do you need from it at const-time? The full functionality of |
@RalfJung I define symbols in a linker script, then get them in my rust code as external symbols. And want to use them in a static objects. In linker.ld:
In rust: extern "C" {
pub fn kernel_end();
}
const unsafe fn get_ext_symb_add(f: unsafe extern "C" fn()) -> *const usize {
f as *const usize
}
pub const fn get_kernel_end() -> *const usize {
unsafe { get_ext_symb_add(kernel_end) }
}
pub static KERNEL_HEAP: Locked<KernelHeap> = Locked::new(KernelHeap::new(get_kernel_end())); So this works. But I need my struct to contain a 4k page aligned value, so I tried the following: match get_kernel_end() {
v if v as usize & 0xFFF == 0 => v,
v => (v as usize & !0xFFF + 0x1000) as *const usize,
} But this is not allowed because the cast to usize would allow things that would lead to non constant results, which I understand. But I was expecting that this will be allowed as it will always produce the same output: let add_of_first_page_after_kernel = get_kernel_end.offset(get_kernel_end.align_offset(0x1000) as isize) |
This is impossible. Const evaluation is done by the compiler, while the final address is determined by the linker, which runs later. The compiler doesn't know the address; all it can do is emit a relocation to ask the linker to insert the address in a specified place in the code or data. A relocation can request adding or subtracting an arbitrary offset to the address, but not more complicated math than that. That said, you can do the math in the linker script. Something like |
@comex Ok yeah, it seems obvious when you explain it. Dunno why I didn't realized it myself. Thanks for for linker align tip, I got the same idea last night when I went to sleep. That's what I will do. Ty. |
Now that Miri's |
There's still the question of, at some point, allowing |
For reference, the main reason I'm interested in having stronger guarantees is that it would enable the use of tagged pointers without platform- or implementation-specific assumptions. For example: #[derive(Clone, Copy)]
pub struct TaggedPtr(*const u8);
impl TaggedPtr {
/// # Safety
///
/// * `ptr` must be properly aligned (i.e., have an alignment of at least 2).
/// * `ptr` must be “dereferencable” in the sense defined by [`std::ptr`](std::ptr#safety).
pub unsafe fn new(ptr: *const u16, tag: bool) -> Self {
Self(unsafe { (ptr as *const u8).add(tag as usize) })
}
/// Returns the stored pointer and tag.
pub fn get(self) -> (*const u16, bool) {
let offset = self.0.align_offset(2);
assert!(offset != usize::MAX); // Ideally, this would be guaranteed.
let tag = offset & 1;
(unsafe { self.0.sub(tag) } as *const u16, tag != 0)
}
}
fn main() {
let i1 = 0_u16;
let i2 = 1_u16;
let p1 = &i1 as *const u16;
let p2 = &i2 as *const u16;
let t1 = unsafe { TaggedPtr::new(p1, true) };
let t2 = unsafe { TaggedPtr::new(p2, false) };
assert_eq!(t1.get(), (p1, true));
assert_eq!(t2.get(), (p2, false));
} The “standard” way of doing this is to cast the pointer to a With stronger |
The implementation of Btw, is there a reason you are not using |
Indeed, and while searching around, I came across someone who noted that the implementation of Making additional guarantees about pointer-to-integer conversions would definitely work for this purpose, but I figured they might be more controversial than strengthening
Ah, good point—that would indeed be better. As silly as it sounds, I hadn't actually realized that |
I must say, as someone who has to align pointers from time to time, To start with, why does it express failure as a sentinel value instead of returning Beyond that, at first I didn't even understand the use case for an alignment operation that can fail. After reading up on Regardless, there are many other use cases for One common one is to allocate an aligned buffer within a non-aligned slice of unused memory, like Some in-memory and file formats have a similar principle: they contain a series of entries which are implicitly padded to multiples of N bytes. For example, if N is 8 and you saw a 5 byte entry, you would have to skip an additional 3 bytes to find the next entry. If you use Admittedly, even in C, these formats are usually implemented by checking the alignment of integer sizes or offsets, not pointers. But not always: here is an example of Rust code using Another use case is testing (Pointless sidenote about C)It was nonintuitive to me that people would usealign_offset for this purpose. I'm more used to C where there's no standard helper function equivalent to align_offset , so you have to write out the math explicitly. To compute the bytes needed to round up, i.e. what align_offset is, you write -foo & (align - 1) , whereas to compute the bytes needed to round *down*, you write foo & (align - 1) . If you want to check whether something is already aligned, either will work, but the latter is less nonintuitive (what is negating a pointer supposed to accomplish?) and one character shorter, so you would usually pick that: (foo & (align - 1)) == 0 . Of course, there's nothing wrong with using align_offset . If it's behind a helper function, intuitiveness doesn't matter, and the performance is likely identical since the optimizer will likely optimize away the negation. That said, I'm surprised that we have align_offset , but don't have a complementary helper function to compute the number of bytes needed to round down.
Honestly, I originally thought this didn't matter very much either way, but researching this comment has changed my mind. Each of the five links above points to real-world code that uses Given that, I am strongly in favor of strengthening If we want a |
It might be worth pointing out C++'s std::align, just as a matter of seeing what other languages do. It also rounds up and also has a weird API (although differently so), but the most important difference is that it isn't allowed to spuriously fail. |
@comex just to be sure, those uses you found that assume that |
The advantage of the sentinel value is that the normal usage on normal platforms has one less branch, as you don't need to match on an option first and then do an inbound check. Instead the inbound check alone suffices |
Oops, I missed this when you originally posted it. I re-checked my examples and they do all use it with Perhaps worth noting: for any type whose size equals its alignment, |
Is there actually any fundamental reason So long as the compiler is guaranteed to be able to recognize every CTFE alignment check (by the lack of ptr2int), it can claim that the allocation is aligned for every checked value, and then if the allocation is promoted to a static, it can require that alignment for the static. This does mean that it would still fail during const eval (in some way, panic, linker error, return failure) for "too large" alignments that still fit in a usize, but that really doesn't sound like a problem tbh. (And slice::align_to/as_simd would never fail) |
I think it would be very surprising if calling Also, I don't think this can actually work: the const to which |
I mean, less surprising than the current definition (and more definitely, results in less obnoxious downstream APIs). The fact that the allocation might be defined in another crate is technically not a problem at the moment with the current definition of |
I strongly disagree. Currently we just have an API that returns an "unknown" result. Your proposal means non-local side-effects of what should be a read-only operation. I can already see people trying to debug why Rust puts such huge amounts of padding into their binary, just because some other crate internally somewhere used
I think it very much is a problem at least with the current implementation in rustc. With The fact that, just looking at this crate, we cannot tell the alignment constraint that will be generated for that |
We have APIs (align_to/as_simd) that return aggressively useless results that make everyone who sees them go wtf. And once they understand, it's still annoying and warps the code to not be able to do simple things that would require the head/tail to be less than the lane count. And you can't force the head to be empty by forcing the alignment elsewhere (and just use these functions as a safe and correct transmute).
Yes, this is not ideal, but let's be realistic here; people don't randomly call these APIs, generally don't call them in const in the first place (the higher-level APIs aren't even
Isn't that precisely the one case where dedup failure can be detected (and comes up via associated consts being generic)? It still does make it far more ugly, though the silly align=size heuristic is always more likely to remove the issue in practice. |
These APIs are not useful for your use case, because they are not meant for it. I created #105296 that adds targetted functions for your use cases. My preference is still to have precise functions instead of a universal function, similar to how the proposal for strict provenance argues against |
You mean |
We can bounce arguments back and forth, but I don't understand why align_to is useless, considering it has libstd usages that don't even need SIMD ops but instead use integer types on the middle slice for measurable speedups. Why do you think having separate APIs is silly? You are throwing away the start and end slice on every call, woundn't you rather just have a function that gives you what you actually want? |
Calling |
The entire reason it has issues is because of And when people are ignoring as_simd/align_to to make their own versions because of the warning text (probably before the update to align_to's warning text, to be fair) I think it's worth calling them useless. I agree that if align_to/as_simd aren't going to go const that they should just be fixed via docs and it's probably worth leaving |
I would suggest mostly ignoring |
No, it just means that this function does not cover those people's needs. It says nothing about the usefulness of the function for its actually intended usecase. We are not looking to turn When you claim "useless" you are claiming "nobody can have a use for this function". That claim is obviously false, and also non-productive. The public API of So if you want to truly move this discussion forward, I suggest you prepare such a document. I don't think there is much to discuss in this issue here in the meantime. |
When people reimplement the exact same function only without the unnecessary caveat, saying "does not cover those people's needs" like it's a special case is silly. And claiming that the public API limitations that are taken straight from its internal implementation and have no other excuse are not related is also wtf. I agree that it sounds like this is not the place to actually solve it. |
Given the semantic and architectural issues with "have A consequence of this proposal is that either, |
Rollup merge of rust-lang#121201 - RalfJung:align_offset_contract, r=cuviper align_offset, align_to: no longer allow implementations to spuriously fail to align For a long time, we have allowed `align_offset` to fail to compute a properly aligned offset, and `align_to` to return a smaller-than-maximal "middle slice". This was done to cover the implementation of `align_offset` in const-eval and Miri. See rust-lang#62420 for more background. For about the same amount of time, this has caused confusion and surprise, where people didn't realize they have to write their code to be defensive against `align_offset` failures. Another way to put this is: the specification is effectively non-deterministic, and non-determinism is hard to test for -- in particular if the implementation everyone uses to test always produces the same reliable result, and nobody expects it to be non-deterministic to begin with. With rust-lang#117840, Miri has stopped making use of this liberty in the spec; it now always behaves like rustc. That only leaves const-eval as potential motivation for this behavior. I do not think this is sufficient motivation. Currently, none of the relevant functions are stably const: `align_offset` is unstably const, `align_to` is not const at all. I propose that if we ever want to make these const-stable, we just accept the fact that they can behave differently at compile-time vs at run-time. This is not the end of the world, and it seems to be much less surprising to programmers than unexpected non-determinism. (Related: rust-lang/rfcs#3352.) `@thomcc` has repeatedly made it clear that they strongly dislike the non-determinism in align_offset, so I expect they will support this. `@oli-obk,` what do you think? Also, whom else should we involve? The primary team responsible is clearly libs-api, so I will nominate this for them. However, allowing const-evaluated code to behave different from run-time code is t-lang territory. The thing is, this is not stabilizing anything t-lang-worthy immediately, but it still does make a decision we will be bound to: if we accept this change, then - either `align_offset`/`align_to` can never be called in const fn, - or we allow compile-time behavior to differ from run-time behavior. So I will nominate for t-lang as well, with the question being: are you okay with accepting either of these outcomes (without committing to which one, just accepting that it has to be one of them)? This closes the door to "have `align_offset` and `align_to` at compile-time and also always have compile-time behavior match run-time behavior". Closes rust-lang#62420
align_offset, align_to: no longer allow implementations to spuriously fail to align For a long time, we have allowed `align_offset` to fail to compute a properly aligned offset, and `align_to` to return a smaller-than-maximal "middle slice". This was done to cover the implementation of `align_offset` in const-eval and Miri. See rust-lang/rust#62420 for more background. For about the same amount of time, this has caused confusion and surprise, where people didn't realize they have to write their code to be defensive against `align_offset` failures. Another way to put this is: the specification is effectively non-deterministic, and non-determinism is hard to test for -- in particular if the implementation everyone uses to test always produces the same reliable result, and nobody expects it to be non-deterministic to begin with. With rust-lang/rust#117840, Miri has stopped making use of this liberty in the spec; it now always behaves like rustc. That only leaves const-eval as potential motivation for this behavior. I do not think this is sufficient motivation. Currently, none of the relevant functions are stably const: `align_offset` is unstably const, `align_to` is not const at all. I propose that if we ever want to make these const-stable, we just accept the fact that they can behave differently at compile-time vs at run-time. This is not the end of the world, and it seems to be much less surprising to programmers than unexpected non-determinism. (Related: rust-lang/rfcs#3352.) `@thomcc` has repeatedly made it clear that they strongly dislike the non-determinism in align_offset, so I expect they will support this. `@oli-obk,` what do you think? Also, whom else should we involve? The primary team responsible is clearly libs-api, so I will nominate this for them. However, allowing const-evaluated code to behave different from run-time code is t-lang territory. The thing is, this is not stabilizing anything t-lang-worthy immediately, but it still does make a decision we will be bound to: if we accept this change, then - either `align_offset`/`align_to` can never be called in const fn, - or we allow compile-time behavior to differ from run-time behavior. So I will nominate for t-lang as well, with the question being: are you okay with accepting either of these outcomes (without committing to which one, just accepting that it has to be one of them)? This closes the door to "have `align_offset` and `align_to` at compile-time and also always have compile-time behavior match run-time behavior". Closes rust-lang/rust#62420
The documentation for
align_offset
saysIt does not give any details of when it might not be possible to perform the alignment. My reading of that is that the user must always be prepared for failure to happen. In accordance with that, a while ago I adjusted the docs for
align_to
(the preferred way to usealign_offset
) to sayIn practice, returning
max_value
happens whenp as usize & (gcd - 1) != 0
(whatever exactly this means, this is taken from the implementation) -- and when running in Miri, which will always returnmax_value
.Historically, Miri did this because it had no notion of the "integer address" of an allocation, so there literally was no way to offset a pointer to get a particular alignment. This is changing now, Miri is getting the support for this. So maybe we should note some conditions under which
align_offset
will definitely succeed. A motivation for this is #61339, which implicitly made the assumption that aligning withsize_of::<T>() == 1
will always succeed.On the other hand, the current contract for
align_offset
lets Miri do more reliable alignment checking. This is off-by-default but can be enabled with-Zmiri-symbolic-alignment-check
: when checking whether some pointerp
that points to offseto
inside an allocation with alignmenta
, we have the option to consider onlya
ando
and not the integer value ofp
. This allows us to reliably detect alignment problems in code such as:If we were to take the actual integer value of
p
into account, the program might get "lucky" and actually run successfully in Miri becausebase_addr
happens to be even. In contrast, by not doing this, Miri can offer a flag where the bug in the program above is definitely caught. With this flag, the user can be sure that none of the accesses in the program are aligned "by chance".However, this also means that when this flag is set, the following code will not pass:
Miri cannot know that you actually did your homework and checked the integer address. This program is basically indistinguishable from the bad program above.
Currently there does not seem to be much code that operates like the last example above -- code will instead use
align_to
, which will (when run in Miri with symbolic alignment checking) make the middle part empty, and thus the "higher-aligned" accesses just don't happen. This means the vast majority of code works fine in the better-alignment-checking mode. If we forcealign_offset
to not fail like #61339 expects, then suddenlyalign_to
will return non-empty middle parts in Miri as well, and-Zmiri-symbolic-alignment-check
will basically be useless. There will be false positives when any method usingalign_to
is called, which includes a few fundamental methods in libcore.So, here's the trade-off: either Miri has a mode that can reliably detect alignment problems, or
align_offset
guarantees success under certain conditions. I don't think we can have both. Which one do we pick?(The particular PR #61339 that triggered this might be fixable to work with the contract Miri needs, I don't know. But this discussion will probably come up again.)
Cc @rust-lang/libs @rust-lang/wg-unsafe-code-guidelines
The text was updated successfully, but these errors were encountered: