-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for Vec::append_from_within() #2714
Conversation
This feels very ad-hoc. I wonder if exposing some intermediate building block would help? Something like: impl<T> Vec<T> {
/// Returns the result of `self.as_mut_slice()` together with the rest of the capacity.
///
/// The two slices have length `self.len()` and `self.capacity() - self.len()` respectively.
pub fn with_uninit_capacity(&mut self) -> (&mut [T], &mut [MaybeUninit<T>]) {…}
}
impl<T> MaybeUninit<T> {
/// Panic if the two slices have different lengths.
pub fn copy_from_slice(this: &mut [Self], other: &[T]) where T: Copy {…}
pub fn clone_from_slice(this: &mut [Self], other: &[T]) where T: Clone {…}
} Then, |
I do not believe the MaybeUninit solution is optimal. For one, Note that the desired end result in all three motivating examples is actually While a hypothetical more general solutions could serve more use cases (see the pre-RFC for a motivating example from Claxon for a more general abstraction), it would take a long time to design and implement, and it would not entirely supersede |
Could we add a self-contained description of why this method is so difficult to implement correctly? Two of the three example links didn't seem to ever describe the bug, and the third explained only at the end of a very long post about a bunch of other security stuff that the broken code forgot to check if an argument was 0 (which imo is a perfectly good answer, if that is correct, it just needs to be easier to find). |
The code in rust stdlib was not checking for overflow in capacity calculation. Here's the fix. DEFLATE decoders forgot to check that one of the arguments is not 0. Here is a simplified version of the code they used that illustrates the problem: /// Repeats `repeating_fragment_len` bytes at the end of the buffer
/// until an additional `num_bytes_to_fill` in the buffer is filled
pub fn decode_rle_vulnerable(buffer: &mut Vec<u8>, repeating_fragment_len: usize, num_bytes_to_fill: usize) {
buffer.reserve(num_bytes_to_fill); // allocate required memory immediately, it's faster this way
unsafe { // set length of the buffer up front so we can set elements in a slice instead of pushing, which is slow
buffer.set_len(buffer.len() + num_bytes_to_fill);
}
for i in (buffer.len() - num_bytes_to_fill)..buffer.len() {
self.buffer[i] = self.buffer[i - repeating_fragment_len];
}
} If you pass |
Thanks, that makes perfect sense now. |
At present, you'll need separate In the simple versions, If If you wait for specialization, then presumably Oops! Just noticed you want this restricted to |
I was using I am open to adding |
On copy-vs-clone, I think
I like this plan -- when I saw it get stabilized in 1.37 I also thought of this scenario, as it implicitly set a precedent for API design for things that treat part of themselves as input. (Now that it's simplified so much and not blazing its own trail -- something like |
@WanzenBug and me have ported It also improved end-to-end performance by |
to guide-level explanation, as requested in a comment
From reading the RFC I found that bikeshed I think switching the position of the parameters of |
I don't think the name |
So I went through |
Thanks for the RFC here @Shnatsel and sorry for the delay! FWIW the libs team tends to not require RFCs for API additions like this nowadays. This looks like a pretty reasonable feature addition to send in a PR, and we can have more discussion there if necessary (but it seems like a lot has happened here already!) and we can have another round of discussion before stabilization. Does that sound alright with you? If so feel free to adapt the discussion here to a PR sent to libstd, and we can close this and open a tracking issue and copy over any remaining unresolved questions and such. |
Sounds good to me! I'll close this as soon as I create a PR. |
BTW: could we get the README in this repo updated? It states that additions to |
Sure! Seems reasonable to update the README |
@Shnatsel did you ever get a chance to make a PR? |
No, I didn't get around to it yet. I got sidetracked by other projects. |
For future reference, this would also eliminate the only unsafe block in ruzstd, the zstd decoder in Rust: |
@Shnatsel am I right, that feature is still unimplemented? If so, I could make a PR. |
Yes, that is correct. After a bunch of experimentation with this in a crate I am convinced that the API described here is the right level of abstraction to expose. The implementation is already written and tested, see https://github.com/WanzenBug/rle-decode-helper/blob/690742a0de158d391b7bde1a0c71cccfdad33ab3/src/lib.rs#L74 All you need to do is make a PR for the standard library adding this function. I'd very much appreciate this since I keep getting sidetracked by other projects. |
Implement <rust-lang/rfcs#2714>, changes from the RFC: - Rename the method `append_from_within` => `extend_from_within` - Loose :Copy bound => :Clone - Specialize in case of :Copy This commit also adds `Vec::split_at_spare` private method and use it to implement `Vec::spare_capacity_mut` and `Vec::extend_from_within`. This method returns 2 slices - initialized elements (same as `&mut vec[..]`) and uninitialized but allocated space (same as `vec.spare_capacity_mut()`).
…r=KodrAus add `Vec::extend_from_within` method under `vec_extend_from_within` feature gate Implement <rust-lang/rfcs#2714> ### tl;dr This PR adds a `extend_from_within` method to `Vec` which allows copying elements from a range to the end: ```rust #![feature(vec_extend_from_within)] let mut vec = vec![0, 1, 2, 3, 4]; vec.extend_from_within(2..); assert_eq!(vec, [0, 1, 2, 3, 4, 2, 3, 4]); vec.extend_from_within(..2); assert_eq!(vec, [0, 1, 2, 3, 4, 2, 3, 4, 0, 1]); vec.extend_from_within(4..8); assert_eq!(vec, [0, 1, 2, 3, 4, 2, 3, 4, 0, 1, 4, 2, 3, 4]); ``` ### Implementation notes Originally I've copied `@Shnatsel's` [implementation](https://github.com/WanzenBug/rle-decode-helper/blob/690742a0de158d391b7bde1a0c71cccfdad33ab3/src/lib.rs#L74) with some minor changes to support other ranges: ```rust pub fn append_from_within<R>(&mut self, src: R) where T: Copy, R: RangeBounds<usize>, { let len = self.len(); let Range { start, end } = src.assert_len(len);; let count = end - start; self.reserve(count); unsafe { // This is safe because `reserve()` above succeeded, // so `self.len() + count` did not overflow usize ptr::copy_nonoverlapping( self.get_unchecked(src.start), self.as_mut_ptr().add(len), count, ); self.set_len(len + count); } } ``` But then I've realized that this duplicates most of the code from (private) `Vec::append_elements`, so I've used it instead. Then I've applied `@KodrAus` suggestions from rust-lang#79015 (comment).
Since rust-lang/rust#79015 was merged, I guess this PR can be closed? |
Rendered
This addition is quite trivial. However, in the pre-RFC I've learned that there are many approaches to solving this, so I'm opening an RFC to provide a clear rationale and get some input on the final design.
Prototype implementation in playground