Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(exex): backfill stream in batches #9738

Merged
merged 10 commits into from
Jul 31, 2024
Merged

Conversation

shekhirin
Copy link
Collaborator

@shekhirin shekhirin commented Jul 23, 2024

Closes #9735

@shekhirin shekhirin added C-enhancement New feature or request A-exex Execution Extensions labels Jul 23, 2024
@shekhirin shekhirin marked this pull request as ready for review July 30, 2024 18:20
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty cool

a few tiny suggestions

crates/exex/exex/src/backfill/stream.rs Outdated Show resolved Hide resolved
crates/exex/exex/src/backfill/stream.rs Show resolved Hide resolved
Comment on lines +112 to +118
// Take the next `batch_size` blocks from the range and calculate the range bounds
let mut range = this.range.by_ref().take(this.batch_size);
let start = range.next();
let range_bounds = start.zip(range.last().or(start));

// Advance the range by `batch_size` blocks
this.range.nth(this.batch_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will run a few times but should be negligible overhead but range could be wrapped into Peekable and integrated into while condition

but it looks like this is just

https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.chunks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using itertools' chunks will require having a lifetime on the struct and also storing the original iterator in a field. We can also do (0..batch_size).map_while(|_| this.range.next()) here, but it's more expensive than doing nth just once.

Didn't get your idea with peekable.

@shekhirin shekhirin requested a review from mattsse July 31, 2024 12:11
@shekhirin shekhirin force-pushed the alexey/backfill-stream-batch branch from 6869a98 to 46ebfe7 Compare July 31, 2024 12:29
@mattsse mattsse added this pull request to the merge queue Jul 31, 2024
Merged via the queue into main with commit 6224619 Jul 31, 2024
33 checks passed
@mattsse mattsse deleted the alexey/backfill-stream-batch branch July 31, 2024 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-exex Execution Extensions C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch execution in backfill job stream
2 participants