-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add slice::check_range
#75207
Add slice::check_range
#75207
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @KodrAus (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
/// [`Index::index`]: ../ops/trait.Index.html#tymethod.index | ||
#[track_caller] | ||
#[unstable(feature = "slice_check_range", issue = "none")] | ||
pub fn check_range<R: RangeBounds<usize>>(&self, range: R) -> Range<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't unfortunately return a CheckedRange<usize>
because there's no association with a specific range, so it's useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more? I don't see how this would be useless, since it converts RangeBounds
to usize
s. It wasn't meant to reduce unsafe code.
It could return a CheckedRange
that has a reference to the slice and a safe get
method. However, that would require check_range_mut
and CheckedRangeMut
too, which would make this simple method much more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code is useful. I meant to say that returning a CheckedRange is not so useful. As you say it could be done with the reference to the slice, but it could make it over engineered, so all is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that makes sense. Thanks for the clarification.
Would you be happy to roll replacing the ad-hoc implementations with this in into the same PR here? Just so we can land it all together and know everything is done 🙂 |
Sure, I can update the PR this week. I was initially trying to reduce the scope of the change, but adding it now is not a problem |
assert!(end <= len, "upper bound was too large"); | ||
(start, end) | ||
// SAFETY: This buffer is only used to check the range. | ||
let buffer = unsafe { slice::from_raw_parts(self.ptr(), self.len()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hack, but it's not too bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this use buffer_as_slice
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used that method at first, but I decided to use from_raw_parts
instead, since it's a hack either way. The issue is that I need a contiguous slice for check_range
, so I use the first self.len()
elements even if they're not initialized.
I could change it to unsafe { self.buffer_as_slice()[..self.len()] }
, but that adds an unnecessary bounds check and looks more valid than it is.
I also added more of this explanation to the safety comment.
There are 3 more ad-hoc implementations:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking on the extra scope creep and running through all the places we could use this! I've just left a few little comments
assert!(end <= len, "upper bound was too large"); | ||
(start, end) | ||
// SAFETY: This buffer is only used to check the range. | ||
let buffer = unsafe { slice::from_raw_parts(self.ptr(), self.len()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this use buffer_as_slice
?
if end > len { | ||
end_assert_failed(end, len); | ||
} | ||
let Range { start, end } = self.check_range(range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this method was pretty finely tuned, do we have any existing benchmarks we can check to see whether there's any impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Vec::drain
has any benches. These are the benches for Vec
:
https://github.com/rust-lang/rust/blob/master/library/alloc/benches/vec.rs
However, I wouldn't expect a regression. check_range
calls functions with the same attributes for this reason. That's part of why I think it's better to package it with the standard library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there aren't any benches then I'd consider it a bit of a speculative implementation anyways, so am happy for this to be made simpler 👍
No problem! It's probably better to include the simplifications in this PR anyway |
📌 Commit d9e877f has been approved by |
⌛ Testing commit d9e877f with merge 82877c725f6510e8514b197af2e06e1e3f586a97... |
💥 Test timed out |
@bors retry |
☀️ Test successful - checks-actions, checks-azure |
Something about this PR is breaking Rust's aliasing rules. The |
if end > len { | ||
end_assert_failed(end, len); | ||
} | ||
let Range { start, end } = self.check_range(range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call here is the problem: this implicitly creates a slice covering the entire vector, which is a problem because that slice aliases with other pointers that existed before and that should remain valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I proposed a fix in #76662.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Wouldn't this use the safe Deref
implementation for Vec
? Even if not, why doesn't the mutable reference to self
on this method prevent aliasing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It prevents aliasing from safe clients, but Vec
explicitly wants to support some unsafe clients that keep pointers into the buffer. That is okay as long as all the operations involved are guaranteed not to reallocate the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here for the testcase that shows how unsafe code can rely on this property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I had no idea Vec
makes those guarantees. Thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these guarantees are documented here. I might have gone overboard in the test and check more than is guaranteed, but then it doesn't seem unlikely that people will take this guarantee as far as they can -- so I added a test basically for every operation that I could imagine preserving the buffer. Someone else double-checking this certainly would not hurt. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I read that passage but not thoroughly enough to notice these subtleties.
but then it doesn't seem unlikely that people will take this guarantee as far as they can
IIUC, there is still room for some of those calls to be removed. From the guarantees:
Bulk insertion methods may reallocate, even when not necessary.
Thus, append
, extend
, extend_from_slice
, and sometimes splice
shouldn't need to be restricted. Of course, keeping them in the test doesn't hurt. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... maybe we should add a comment though that this test does not constitute a guarantee. I'll prepare a PR.
(start, end) | ||
// SAFETY: This buffer is only used to check the range. It might be partially | ||
// uninitialized, but `check_range` needs a contiguous slice. | ||
// https://github.com/rust-lang/rust/pull/75207#discussion_r471193682 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a partially uninitialized slice is UB. So I don't think the safety comment here is accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was something the standard library could assume. Wouldn't that make buffer_as_slice
UB too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was meant to rely on "the standard library is special and can bend the rules", the comment should explicitly say so. The standard library can't just implicitly do this, the risk is too high that someone will copy this code and use it in their own crate.
However, even then it is better to avoid; usually we only do that for layout assumptions (Cc rust-lang/unsafe-code-guidelines#90). For other kinds of UB, it is typically considered a bug.
Wouldn't that make buffer_as_slice UB too?
Maybe? I am not familiar with the code I am afraid, I just read the safety comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looking at the code, buffer_as_slice
should likely return a raw slice instead of a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was meant to rely on "the standard library is special and can bend the rules", the comment should explicitly say so.
You're right. My mistake.
However, even then it is better to avoid; usually we only do that for layout assumptions
That makes sense. I was mostly using buffer_as_slice
as a reference, but I agree that avoiding that assumption is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's lots of old code here that was written before we had clear rules, and so does not follow today's rules. This case here is not a big deal because the discussion is still open whether we want this to be UB or not (that's rust-lang/unsafe-code-guidelines#77), but while we discuss it's better to follow the rules, even in the standard library, also to learn how hard/annoying it is to work with these rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh also the standard library currently does a poor job at documenting when it relies on extra assumptions, but I am pushing for new code that we add to do better. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all sounds great, :)
/// [`Index::index`]: ops::Index::index | ||
#[track_caller] | ||
#[unstable(feature = "slice_check_range", issue = "none")] | ||
pub fn check_range<R: RangeBounds<usize>>(&self, range: R) -> Range<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we only need the length of the slice, I don't think this method should take a reference to the full slice. References are very strong types, they assert the aliasing rules and they assert that the memory they point to is initialized. As my other messages here show, that is a problem for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future readers, this conversation was continued here.
…lacrum Fix liballoc test suite for Miri Mostly, fix the regression introduced by rust-lang#75207 that caused slices (i.e., references) to be created to invalid memory or memory that has aliasing pointers that we want to keep valid. @dylni this changes the type of `check_range` to only require the length, not the full reference to the slice, which indeed is all the information this function requires. Also reduce the size of a test introduced in rust-lang#70793 to make it not take 3 minutes in Miri. This makes https://github.com/RalfJung/miri-test-libstd work again.
…lacrum Fix liballoc test suite for Miri Mostly, fix the regression introduced by rust-lang#75207 that caused slices (i.e., references) to be created to invalid memory or memory that has aliasing pointers that we want to keep valid. @dylni this changes the type of `check_range` to only require the length, not the full reference to the slice, which indeed is all the information this function requires. Also reduce the size of a test introduced in rust-lang#70793 to make it not take 3 minutes in Miri. This makes https://github.com/RalfJung/miri-test-libstd work again.
…lacrum Fix liballoc test suite for Miri Mostly, fix the regression introduced by rust-lang#75207 that caused slices (i.e., references) to be created to invalid memory or memory that has aliasing pointers that we want to keep valid. @dylni this changes the type of `check_range` to only require the length, not the full reference to the slice, which indeed is all the information this function requires. Also reduce the size of a test introduced in rust-lang#70793 to make it not take 3 minutes in Miri. This makes https://github.com/RalfJung/miri-test-libstd work again.
This method is useful for
RangeBounds
parameters. It's even been rewritten many times in the standard library, sometimes assuming that the bounds won't beusize::MAX
.For example,
Vec::drain
creates an empty iterator whenusize::MAX
is used as an inclusive end bound:If this PR is merged, I'll create another to use it for those methods.