Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OUTDATED fix: prevent timeouts while writing to object storage #1916

Closed
wants to merge 4 commits into from

Conversation

wjones127
Copy link
Contributor

Closes #1878

@wjones127 wjones127 marked this pull request as ready for review February 6, 2024 05:09
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good find but I'm not sure I understand yet.

Comment on lines +43 to +49
/// A task that flushes the data every 500ms. This is to make sure that the
/// futures within the writer are polled at least every 500ms. This is
/// necessary because the internal writer buffers data and holds up to 10
/// write request futures in FuturesUnordered. These futures only make
/// progress when polled, and if they are not polled for a while, they can
/// cause the requests to timeout.
background_flusher: tokio::task::JoinHandle<()>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't those futures be polled when this AsyncWrite is polled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the AsyncWrite::write returns Poll::Ready once it has put the read task onto it's internal FuturesUnordered. So the scheduler has no reason to poll the internal tasks.

There is a simpler reproduction here:

apache/arrow-rs#5366 (comment)

Comment on lines +120 to +122
if let Ok(err) = this.background_error.try_recv() {
return Poll::Ready(Err(err));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these try_recv blocks be inside the mutex? Otherwise is there a slight possiblity that you could:

Check background_error, no error
Background error occurs
Check poll_write, it returns ready, background error is never checked again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. Thanks

let background_flusher = tokio::task::spawn(async move {
loop {
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
match writer_ref.lock().unwrap().flush().now_or_never() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we call flush after a writer is finished writing / closed?

Also, what is the normal exit path for this task? It looks like it can only exit this loop if flush returns an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now calling flush after it's done seems to be a no-op, but there's no strong guarantee of that in the API. We should abort this task in the shutdown method, I think.

@wjones127 wjones127 changed the title fix: prevent timeouts while writing to object storage OUTDATED fix: prevent timeouts while writing to object storage Feb 6, 2024
Copy link

github-actions bot commented Feb 6, 2024

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@wjones127 wjones127 closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write to S3 fails with operation timed out
2 participants