-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Iterator::collect_into and Iterator::collect_with #92982
Conversation
/// | ||
/// let doubled = a.iter() | ||
/// .map(|&x| x * 2) | ||
/// .collect_with(Vec::new()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sets a bad example since it uses Extend
and we have separate code paths for FromIterator
which are optimized for the fact that we're starting with an empty vec. It's kind of like doing
let mut vec = Vec::new()
vec.extend(iter);
vec
Which is a silly thing to do when you can just use iter.collect()
For that reason I think the name also is suboptimal since it associates itself with collect()
and not extend()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I put Vec::with_capacity(3) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_hint
already takes care of that in many cases, so it would only help in cases where the size hint of the iterator isn't accurate (e.g. when it contains a filter
). Or maybe when collecting into a collection with a custom allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so if I understood correctly, I should use a filter
(or some other adapter) in the examples that makes size_hint
not provide any useful info for iter.collect()
right?
I could also mention that usually iter.collect()
is preferred when size_hint
is accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lack of optimisation for the empty case is a bit of a worry in terms of whether collect_with
is a good API at all. Wouldn't all of the good usages be better covered by the mutable collect_into
? I had envisioned we could use .collect_with(Vec::new())
to replace the turbofish, but it seems that is almost never a good idea, or at least only applicable in certain circumstances. Now we would be creating a more ergonomic API whose use we would have to immediately discourage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect_with
can be very useful in order to avoid reallocation(s) when size_hint
returns the default value.
I had envisioned we could use .collect_with(Vec::new()) to replace the turbofish, but it seems that is almost never a good idea, or at least only applicable in certain circumstances.
I agree that it is almost never a good idea. I also like more the turbofish syntax where applicable.
My idea for collect_with
was a method to be used when size_hint
is not helpful and the collection (in which the items are collected) does not already exists. This allows shorter code:
let len1 = vec![1, 2, 3]
.into_iter()
.filter(|&x| x < 3)
.collect_with(Vec::with_capacity(2))
.len();
// Instead of
let mut vec = Vec::with_capacity(2);
let len2 = vec![1, 2, 3]
.into_iter()
.filter(|&x| x < 3)
.collect_into(&mut vec)
.len();
I think we could specify it in the doc. However, someone could still misuse the method. Calling Extend::extend_reserve
should, at least, fix the speed gap between collect()
and misused collect_with
. (not already implemented)
Eventually, I realised collect_with
can be implemented with a call to collect_into
, should it be a better implementation?
Co-authored-by: the8472 <the8472@users.noreply.github.com>
r? rust-lang/libs |
Also since the original suggestion in #48597 we pulled |
Also, since let len1 = vec![1, 2, 3]
.into_iter()
.filter(|&x| x < 3)
.collect_with(Vec::with_capacity(2))
.len();
// Instead of
let mut vec = Vec::with_capacity(2);
let len2 = vec![1, 2, 3]
.into_iter()
.filter(|&x| x < 3)
.collect_into(&mut vec)
.len(); |
Sorry, just read the full comment again, the problem with There are at least three things going on here:
I think number 1 is done, number 2 is very difficult, and number 3 we can think of a better way than |
I think that's a very poor motivating example because it could be replaced with a
It's not quite that bad. Vec's
I don't think difficult is the right word here, I already have muscle memory to type let v: Vec<u8> = iter.collect(); And sometimes you need the more specific type hint anyway because it can't infer the exact type that goes into the vec.
Providing the wrong size hint is considered a bug, which can cause implementations to misbehave (just not in unsound ways). So we'd have to verify correctness during iteration. And if we're dealing with a situation where the iterator length isn't known then these checks probably can't be optimized out either because the optimizer can't reason about the count. |
What is the mechanism by which you can recycle an allocation in |
Specialization, RFC 1210. In this case it's applied to
Well, since it uses So it's a tradeoff between ergonomics, compiletime and runtime performance. |
Oh right, only when the iterator actually is Couldn't you just have a custom implementation of |
Well, not arbitrary transformations, but ... rust/library/alloc/tests/vec.rs Lines 1015 to 1033 in ee5d8d3
|
The point I wanted to show was that a variable definition can be avoided using let filtered_vec = (0..50)
.filter(|&x| x < 40)
.collect_with(Vec::with_capacity(40));
// Instead of
let mut vec = Vec::with_capacity(40);
let filtered_vec2 = (0..50)
.filter(|&x| x < 40)
.collect_into(&mut vec); |
That's just 1 line more, and you wouldn't need the 2nd |
This is true. I still think it is slightly more readable with |
So how do we feel about splitting this and merging |
I'm definitely skeptical of having both, TBH. If you think that one of them is more important, I definitely think that starting with that one is better. And I agree that |
I think I'm going to do it. Is it better to do a commit on this PR or a new PR? |
Here is OK but would need to edit the PR's title and introductory comment because that gets copied around by the bots to various merge commits and summaries. Also for Rust it's generally best to force push a clean commit history, in this case one commit with just the one method. GitHub keeps old refs but I'm not sure it stores them forever, so maybe a new PR is best for preservation. It's up to you basically. |
@frengor Up to you which you think is more helpful for the reviewer. If the conversation here is still valuable, might as well change this. (I don't see any comments from josh, so you could probably even just force-push it to a single new commit.) If the conversation here is obviated by the new approach, then might as well close this one and make a new PR to clean things up. |
All the concerns are about |
Thanks @frengor, great work, keep it up! |
Add Iterator::collect_into This PR adds `Iterator::collect_into` as proposed by `@cormacrelf` in rust-lang#48597 (see rust-lang#48597 (comment)). Followup of rust-lang#92982. This adds the following method to the Iterator trait: ```rust fn collect_into<E: Extend<Self::Item>>(self, collection: &mut E) -> &mut E ```
Add Iterator::collect_into This PR adds `Iterator::collect_into` as proposed by ``@cormacrelf`` in rust-lang#48597 (see rust-lang#48597 (comment)). Followup of rust-lang#92982. This adds the following method to the Iterator trait: ```rust fn collect_into<E: Extend<Self::Item>>(self, collection: &mut E) -> &mut E ```
This is a PR for adding
Iterator::collect_into
andIterator::collect_with
as proposed by @cormacrelf in #48597 (see #48597 (comment)).This adds the following methods to the Iterator trait: