Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SkipTo for iter::Iter #287

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

grelner
Copy link

@grelner grelner commented Aug 6, 2024

This is the start of the implementation of #286 .
It adds a new trait SkipTo which for now is only implemented for iter::Iter.

The SkipTo trait contains a method skip_to(&mut self, n:u32) which should return an Iterator that yields values >= n.
The implementation for iter::Iter achieves this by doing a binary search for the correct container, and then another binary search if the container is an ArrayStore, or forwarding like a normal BitmapIter until a suitable value is found for BitmapStore.

Right now SkipTo is implemented for iter::Iter, but I don't think this is semantically sound, and it should probably rather be implemented for RoaringBitmap. In the discussion for #286 @Kerollmops wanted a similar api to SkipWhile, but this does not really fit, as SkipWhile can perform it's operation only using an existing iterator, but this is not true for SkipTo, as it needs direct access to the containers to do it's skip efficiently, not only the existing iterator. A reference to the container vec has been added to iter::Iter only for this purpose. The iterator returned is fully independent, and not related to the iterator on which the skip_to call was made at all.

This PR in it's current state implements what I need for my project, but I would like to finish it, albeit at a slightly slower pace.

Comment on lines 309 to 312
fn skip_to(self, n: u32) -> Self::Iter {
SkipToIter::new(&self.containers, n)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: using the full unmodified self.containers means this might skip backwards.

To do this optimally, we might need to somewhat re-implement std::iter::Flatten so we can access the slice::Iter directly (from which we can get the remaining slice), and access to the current container iterator. :/

Copy link
Author

@grelner grelner Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? self.containers is just a reference to RoaringBitmap.containers. It was added in this PR
edit: ah, self.containers is RoaringBitmap.containers here. Then there's something I don't understand here if they can be backwards 😅

I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.
Still, are all these changes worth it to have a double ended iterator and cover the kinda weird case of skip_to to a point the iterator has already passed?

Copy link
Member

@Kerollmops Kerollmops Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @grelner 👋

Sorry for the late reply, I'm in vacation 🌞

To do this optimally, we might need to somewhat re-implement std::iter::Flatten so we can access the slice::Iter directly (from which we can get the remaining slice), and access to the current container iterator.

I agree with what @Dr-Emann explains here. This is, to me, the best way to implement this iterator. Flatten is not hard to implement. We can have a special Flatten type with all the original Flatten type logic but dedicated to slices and bitmaps so that it can skip groups according to the split high group (skipping whole slices or bitmaps if the number we want to find is not in this group).

I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.

Let's forget about the SkipTo wrapper and only implement a non-lazy method like Iterator::advance_by method that directly advances the iterator to the requested number. This method, skip_to, is only available on the bitmap::Iter/IntoIter types, so there is no need to think about std::iter::Rev iterator adaptator.

@grelner grelner marked this pull request as ready for review August 22, 2024 04:59
@grelner
Copy link
Author

grelner commented Aug 22, 2024

Hi sorry for the slow progress, I've been on vacation too.
I've pushed some changes and I think it's ready for review now.
This implements:

  • advance_to for Iter and IntoIter, which skips containers using the value key. This uses an existing iterator for data
  • Bitmap::iter_from, which is faster that advance_to, because it does a binary search to find the container to start from
    This change significantly changes the iterator implementations, which now implement their own flatten functionality.

@grelner
Copy link
Author

grelner commented Aug 22, 2024

@Kerollmops sorry this 1.66 requirement keeps tripping me up. I pushed a fix for it now. Maybe that should be added to https://github.com/RoaringBitmap/roaring-rs?tab=readme-ov-file#developing

@grelner
Copy link
Author

grelner commented Aug 22, 2024

bench-skip_to.txt
cargo bench iteration results of the branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants