Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart upload can leave futures unpolled, leading to timeout #5366

Closed
wjones127 opened this issue Feb 6, 2024 · 11 comments · Fixed by #5369
Closed

Multipart upload can leave futures unpolled, leading to timeout #5366

wjones127 opened this issue Feb 6, 2024 · 11 comments · Fixed by #5369
Assignees
Labels
bug object-store Object Store Interface

Comments

@wjones127
Copy link
Member

Describe the bug

Instead of waiting until the data passed to the writer is uploaded to return ready, we buffer it until there is enough data and then put the request future in FuturesUnordered. I thought this was pretty clever when I originally wrote it, but it has a big flaw: If a lot of time passes between write calls, the request futures can go all that time without being polled. This can cause timeouts such as:

Generic S3 error: Error after 0 retries in 67.379407043s, max_retries:10, retry_timeout:180s, source:error sending request for url (s3://...): operation timed out

To Reproduce

I discovered this downstream in some code that uses multi part uploads: lancedb/lance#1878 (comment)

Expected behavior

I think the best solution is to run the upload tasks inside of a background task created with tokio::task::spawn().

Additional context

It's quite possible this is the same underlying cause of #5106

@wjones127 wjones127 added the bug label Feb 6, 2024
@wjones127 wjones127 self-assigned this Feb 6, 2024
@tustvold
Copy link
Contributor

tustvold commented Feb 6, 2024

Perhaps I am missing something, why would the future not be polled if there is work to do? This sounds like the issue is the user is multiplexing cpu bound tasks on the same threadpool/task, and therefore starving their write tasks.

The conclusion of prior efforts in this space, e.g. #4040, is this is a user problem

@wjones127
Copy link
Member Author

Okay, let me try creating a reproduction in Rust.

@wjones127
Copy link
Member Author

Here is the reproduction:

  1. Create a Multi-part upload
  2. Write 10 MB (trigger a request)
  3. Sleep for 30 seconds
  4. Try writing more data (this will fail with timeout)

I think we would agree that it might be expected behavior if the sleep in step (3) was synchronous/blocking. However, this will fail even if the sleep is async, which makes this feel like a real foot gun.

use object_store::{aws::AmazonS3Builder, path::Path, ObjectStore};
use tokio::io::AsyncWriteExt;

#[tokio::main]
async fn main() {
    // Create an S3 store configured from environment variables
    let store = AmazonS3Builder::from_env().with_bucket_name("lance-performance-testing").build().unwrap();

    let (_id, mut writer) = store.put_multipart(&Path::from("test")).await.unwrap();

    // Upload 10 MB of data
    let data_chunk = vec![0u8; 1024  * 1024];
    for _ in 0..10 {
        writer.write_all(&data_chunk).await.unwrap();
    }

    // Sleep for 30 seconds
    tokio::time::sleep(tokio::time::Duration::from_secs(30)).await;
    
    // try to upload 10 more (should fail here)
    for _ in 0..10 {
        writer.write_all(&data_chunk).await.unwrap();
    }

    // finish.
    writer.shutdown().await.unwrap();
}
called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: Generic { store: "S3", source: Reqwest { retries: 0, max_retries: 10, elapsed: 30.00097806s, retry_timeout: 180s, source: reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("s3.us-east-1.amazonaws.com")), port: None, path: "/lance-performance-testing/test", query: Some("partNumber=1&uploadId=mOQTaVt2JtGlMwB2IceAR2olH5FyBgk_oJVbZ_zh9XRDU6VbbIRWmx0KKk3MM6vVb2vCA93jP4xIiY9lAo0BHxULl5jNV0issWUnqVyFk.dA.xt3mmaz5U_GK_g1OqYCn4ZMMZ61xIn22FzDWuU.PQ--"), fragment: None }, source: TimedOut } } } }

@tustvold
Copy link
Contributor

tustvold commented Feb 6, 2024

Hmm... I wonder if the issue here is that write_all is not waiting for in-progress writes to finish? Is it possible the poll_flush implementation isn't quite right? I would expect the above to work, without needing to spawn work.

@tustvold
Copy link
Contributor

tustvold commented Feb 6, 2024

It looks like this is actually an issue with the underlying IO traits - tokio-rs/tokio#4296

TLDR is the user must explicitly flush when they've finished writing. I agree this is an unfortunate footgun

@wjones127
Copy link
Member Author

I wonder if the issue here is that write_all is not waiting for in-progress writes to finish?

That is correct, but it's due to the design of the multipart write implementation. By design, when I wrote it, poll_write() will return Poll::Ready without waiting for all write requests to complete. If we changed it so that poll_write only returned Ready until the underlying FuturesUnordered is empty again, that would also fix this. Then you would only get concurrency if your call write() with chunks larger than 10MB. Otherwise it will always run one upload at a time. Does this seem like a preferable tradeoff over using spawn? Or do you see another way?

@wjones127
Copy link
Member Author

Oh I see what you mean about flush. Basically, anytime a user expects there will be a significant gap in time between write calls, they must call flush. I think that could work.

@tustvold
Copy link
Contributor

tustvold commented Feb 6, 2024

By design, when I wrote it, poll_write() will return Poll::Ready without waiting for all write requests to complete.

Yes this is an unfortunate property of the AsyncWriteExt trait, I agree it is unfortunate and the method is definitely confusing, we should definitely document this footgun and link back to the tokio ticket.

@tustvold
Copy link
Contributor

tustvold commented Mar 1, 2024

label_issue.py automatically added labels {'object-store'} from #5369

@ion-elgreco
Copy link

@tustvold does this affect only MultiPart?

Because I'm seeing since object store 0.9 quite some timeouts in ADLS and S3 with Delta-RS

@tustvold
Copy link
Contributor

tustvold commented Mar 22, 2024

This ticket is specifically multipart.

Because I'm seeing since object store 0.9 quite some timeouts in ADLS and S3 with Delta-RS

What version were you running before, 0.8 set some default timeouts (previously they were unlimited).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug object-store Object Store Interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants