-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multipart upload: read directly from file #494
Comments
We haven't implemented any high-level APIs for S3 yet, so as you've discovered, this is manual and tedious right now. I will convert this issue into a feature request so that people can upvote it, and in the meantime, I'll see if we can throw together an example. |
Hi @Velfi and @jdisanti, out of curiosity, I started this trivial example (with only one pub async fn upload_object_with_multipart(
client: &Client,
bucket_name: &str,
file_path: &str,
object_key: &str,
) -> Result<(), Box<dyn Error>> {
/// instantiate a multipart upload
let create_multipart_upload_response = client
.create_multipart_upload()
.bucket(bucket_name)
.key(object_key)
.send()
.await?;
/// grab the upload_id that we'll use when uploading parts
let upload_id = match create_multipart_upload_response.upload_id() {
Some(upload_id) => upload_id.to_owned(),
None => panic!("Something went wrong when creating multipart upload."),
};
/// upload a file in parts
// TODO: after this trivial example works, demonstrate this with actual multiple parts, not just as one part (single byte stream)
let body = match ByteStream::from_path(Path::new(file_path)).await {
Ok(byte_stream) => byte_stream,
Err(e) => panic!("{}", e),
};
let _upload_part_response = client
.upload_part()
.bucket(bucket_name)
.key(object_key)
.upload_id(&upload_id)
.part_number(1)
.body(body)
.send()
.await?;
let list_parts_response = client
.list_parts()
.bucket(bucket_name)
.key(object_key)
.upload_id(&upload_id)
.send()
.await?;
let parts_list = list_parts_response.parts();
/// complete the multipart upload
client
.complete_multipart_upload()
.bucket(bucket_name)
.key(object_key)
.upload_id(upload_id)
// .body(parts_list) // ? .body() does not exist on fluent_builders::CompleteMultipartUpload, even though body is required according to AWS S3 CompleteMultipartUpload documentation
.send()
.await?;
println!("Uploaded file in parts: {}", file_path);
Ok(())
} Is there a way to set the POST body on S3 client's Without the ability to set the HTTP POST body on the underlying HTTP call, the AWS S3 API will respond with an error like:
If producing an example is workable, I'd be happy to help do so. Please let me know if I'm missing some obvious method that I should be using instead. P.S., by searching this error message on stackoverflow, it is easy to find that this caused some confusion for other language SDKs (for example PHP: https://stackoverflow.com/a/31413193/18257455 ), however those developers were able to resolve by setting an Array of parts to a In this Rust SDK, there does not (at a glance) appear to be a way for an SDK user to construct the If How do we construct a Thanks! |
I believe this is the field you are missing: https://docs.rs/aws-sdk-s3/0.10.1/aws_sdk_s3/client/fluent_builders/struct.CompleteMultipartUpload.html#method.multipart_upload it will be serialized into the body |
@rcoh Thanks! I took a look at the method again and see now how the SDK's fluent builder are structured. While this is a bit verbose, I at least have a working (See #494 (comment) for the section of code that does the upload) // just showing the parts relevant for the HTTP POST body serialization
/// get the list of uploaded parts
let parts_list = list_parts_response.parts().unwrap();
/// construct vec of CompletedParts (required to build CompletedMultipartUpload)
let completed_parts: Vec<aws_sdk_s3::model::CompletedPart> = parts_list
.iter()
.map(|part| {
aws_sdk_s3::model::CompletedPart::builder()
.e_tag(part.e_tag.as_ref().unwrap())
.part_number(part.part_number)
.build()
})
.collect();
/// construct CompletedMultipartUpload (required to construct CompleteMultipartUpload in a way that correctly serializes part list to HTTP POST body
let completed_multipart_upload = aws_sdk_s3::model::CompletedMultipartUpload::builder()
.set_parts(Some(completed_parts))
.build();
/// complete the multipart upload
client
.complete_multipart_upload()
.bucket(bucket_name)
.key(object_key)
.upload_id(upload_id)
.multipart_upload(completed_multipart_upload)
.send()
.await?;
Next, this example can be cleaned up to demonstrate the Rust way of reading bytes of file incrementally and what that could look like. (My trivial multipart example only uses one "part") If its OK, I'd like to take a stab at that tonight or tomorrow |
nice! yeah this looks good so far. Note that the eventual home for this is https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/rust_dev_preview the examples in this repo get built from their at release / codegen time The multi-part will be a bit tricky until we add support for #519 — if you were interested in making a PR for that we'd gladly accept it. |
@rcoh Thank you for that info! Just curious, would you know someone who might know what the relationship is between Context Objective: Implement example AWS S3 Rust SDK Multipart upload (potentially in a simpler way than what I see in #519?) (Edit: I see now that the file-seeking logic that I see in #519 is what an SDK consumer will need to write if the SDK maintainer's goal is to add this Actions: I have an example of using AWS S3 SDK's Significant detail:
Why is this significant: Hopefully that makes sense, just started looking at this- I know AWS Rust SDK is only in preview mode but I'm excited to see it get used more! Thanks! |
the |
I have a PR in progress that should make this much easier. It allows for setting a byte offset when creating a |
Great! |
It'd be nice if the multipart upload took an The AsyncRead would work with raw files and also allow compressing data from a file as a lazy stream when uploading to s3. |
Community Note
Prerequisites
Question Description
I am currently developing a program that will (among various things) upload large files to S3.
I found how to perform normal uploads directly from files (without having to load them in RAM) in the doc, but I have an issue with multipart uploads (required for larger files).
If I understand things correctly,
Client.put_object()
will not perform a multipart upload and, unlike for other languages, the SDK does not provide a high-level API to do them automatically, you have to do them manually usingClient.create_multipart_upload()
,Client.upload_part()
andClient.complete_multipart_upload()
.The issue is that, while when performing a normal upload, I can just do
.body(ByteStream::from_path("some/path")
, it does not seem possible to do the same with a multipart upload. Since.body()
must be called for each part, creating a new ByteStream from a path each time won't work (only the beginning of the file will be read) and I did not find a proper way to seek in a ByteStream (except reading individual bytes in a loop, which would lead to useless IO). The.take()
function of the trait StreamExt sounds interesting, but its result is not usable as parameter for.body()
, and simple trying to pass the same ByteStream at each iteration will not work, as the Rust compiler consider, likely for good reasons, that the ByteStream is "moved" when passing it to.body()
(not to mention the fact that the function may not even know when to stop reading, since it is not clear for me whether.content_length()
has an effect on stream reading). The only workaround I can imagine is to read the whole part from the file to RAM and useByteStream::from()
, but this is definitely not a proper way to perform an upload from a file.So, did I miss something? Is there a proper way to perform a multipart upload reading data directly from a file, is there an issue with the API or is this a feature left for a future version?
I am not very experienced with Rust, and I am discovering the AWS SDK as well as the 3rd party libraries it relies on, so maybe it's just me who do not understand how to do things properly, but in that case, I would appreciate some help, in the other case, I hope this SDK could be improved to allow multipart uploads directly from a file.
Platform/OS/Device
GNU/Linux (amd64)
Language Version
Rust 1.59.0
The text was updated successfully, but these errors were encountered: