Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator::nth doesn't compose well #60395

Closed
timvermeulen opened this issue Apr 30, 2019 · 7 comments
Closed

Iterator::nth doesn't compose well #60395

timvermeulen opened this issue Apr 30, 2019 · 7 comments
Labels
A-iterators Area: Iterators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@timvermeulen
Copy link
Contributor

Iterators like iter::{Flatten, FlatMap, Chain} that chain multiple iterators together aren't able to implement Iterator::nth in terms of the nth method on their inner iterators, because when nth returns None, there's no way to know how many elements are "left".

This can lead to surprisingly bad performance when the inner iterator has an efficient nth implementation. For example, (0..100_000).chain(0..100_000).nth(50_000) has O(n) performance, even though it never reaches the second iterator.

I don't really know if this is worth fixing, but if it is, a way to fix it could be to add a method similar to nth to Iterator that returns Result<Self::Item, usize>. The Err case represents the "remaining" n (there's probably a better way to phrase that).

I don't have any interesting benchmarks, but I can confirm that implementing what I've temporarily called try_nth for ops::Range and iter::Chain makes (0..100_000).chain(0..100_000).nth(50_000) run in constant time. So returning Result<Self::Item, usize> indeed composes better. But again, this may not be worth fixing. 🙂

@Centril Centril added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Apr 30, 2019
@Mark-Simulacrum
Copy link
Member

I'm not sure if we can get away with it (depending on the guarantees nth exposes), but it's possible this is simply a lack of specialization. Presumably chain could forward TrustedLen/ExactSizeIterator and the like such that we'd be able to "skip" within it as needed.

@Stargateur
Copy link
Contributor

That more chain that doesn't compose very well ?

@jonas-schievink jonas-schievink added the A-iterators Area: Iterators label Apr 30, 2019
@timvermeulen
Copy link
Contributor Author

@Mark-Simulacrum That's a great point, an ExactSizeIterator bound would indeed allow a better implementation. Do note that this isn't quite as powerful as always being able to forward to nth, though, because some types could conceivably implement nth more efficiently than the default implementation without implementing ExactSizeIterator.

@cuviper
Copy link
Member

cuviper commented Apr 30, 2019

I don't really know if this is worth fixing, but if it is, a way to fix it could be to add a method similar to nth to Iterator that returns Result<Self::Item, usize>. The Err case represents the "remaining" n (there's probably a better way to phrase that).

Would iter::empty().try_nth(0) return Err(0) or Err(1)?

This is an extension of the general off-by-one confusion of nth, which needs careful documentation.

@timvermeulen
Copy link
Contributor Author

@cuviper It would return Err(0). It's indeed very important to be wary of off-by-one errors, and I believe this is the more natural choice, because:

  • Having the invariant that n is always non-zero seems inconsistent with the parameter of nth being zero-indexed.
  • The combinators that would benefit from this can pass this same n to try_nth of the next iterator, rather than having to do integer arithmetic.
  • It matches the final value of n in the current default implementation of nth.

@timvermeulen
Copy link
Contributor Author

@Mark-Simulacrum Note that a chain of more than two iterators wouldn't be able to benefit from that specialization because Chain itself doesn't implement ExactSizeIterator, so only try_nth would help there. Or e.g. a Chain nested inside a Flatten.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Oct 1, 2020
Add Iterator::advance_by and DoubleEndedIterator::advance_back_by

This PR adds the iterator method

```rust
fn advance_by(&mut self, n: usize) -> Result<(), usize>
```

that advances the iterator by `n` elements, returning `Ok(())` if this succeeds or `Err(len)` if the length of the iterator was less than `n`.

Currently `Iterator::nth` is the method to override for efficiently advancing an iterator by multiple elements at once. `advance_by` is superior for this purpose because
- it's simpler to implement: instead of advancing the iterator and producing the next element you only need to advance the iterator
- it composes better: iterators like `Chain` and `FlatMap` can implement `advance_by` in terms of `advance_by` on their inner iterators, but they cannot implement `nth` in terms of `nth` on their inner iterators (see rust-lang#60395)
- the default implementation of `nth` can trivially be implemented in terms of `advance_by` and `next`, which this PR also does

This PR also adds `DoubleEndedIterator::advance_back_by` for all the same reasons.

I'll make a tracking issue if it's decided this is worth merging. Also let me know if anything can be improved, this went through several iterations so there might very well still be room for improvement (especially in the doc comments). I've written overrides of these methods for most iterators that already override `nth`/`nth_back`, but those still need tests so I'll add them in a later PR.

cc @cuviper @scottmcm @Amanieu
@timvermeulen
Copy link
Contributor Author

This is no longer relevant now that Iterator::advance_by has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants