Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with S3 download failures by streaming files #4928

Merged

Conversation

shinyhappydan
Copy link
Contributor

@shinyhappydan shinyhappydan commented May 2, 2024

The diff here is big but there are not many actual changes. The changes are:

  • Usage of AsyncResponseTransformer (and some processing of what is returned in the readFile method) to actually stream when fetching a file
  • Removal of multiPartRead since we can stream the file normally now
  • Removal of BucketName / FileKey as they are no longer used
  • Splitting S3StorageClient as the file was much too big. Privacy rules are used to ensure users cannot construct impls directly (this was actually happening before, my understanding is we prefer not to do this)

Comment on lines +10 to +27
private[client] class Fs2StreamAsyncResponseTransformer
extends AsyncResponseTransformer[GetObjectResponse, Publisher[ByteBuffer]] {

private val cf = new CompletableFuture[Publisher[ByteBuffer]]()
override def prepare(): CompletableFuture[Publisher[ByteBuffer]] = cf

override def onResponse(response: GetObjectResponse): Unit = ()

override def onStream(publisher: SdkPublisher[ByteBuffer]): Unit = {
cf.complete(publisher)
()
}

override def exceptionOccurred(error: Throwable): Unit = {
cf.completeExceptionally(error)
()
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main part of the functionality

Comment on lines +29 to +31
Stream
.eval(client.getObject(getObjectRequest(bucket, fileKey), new Fs2StreamAsyncResponseTransformer))
.flatMap(_.toStreamBuffered[IO](2).flatMap(bb => Stream.chunk(Chunk.byteBuffer(bb))))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the streaming is happening

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 2 for ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the buffer size. It means that it'll ask for 2 bytebuffers at a time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It was taken from the example you gave, it's not a value I chose)

import fs2.Stream
import software.amazon.awssdk.services.s3.model._

private[client] object S3StorageClientDisabled extends S3StorageClient {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to address in this PR and not a priority but we may want to get rid of this one.
As long as you can't create a S3 storage when the feature is disabled, it might be enough ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this probably shouldn't be needed

sourceKey: Models.FileKey,
destinationBucket: Models.BucketName,
destinationKey: Models.FileKey,
sourceBucket: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be nice to reintroduce something more refined than strings at some point

Copy link
Contributor Author

@shinyhappydan shinyhappydan May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem with these types specifically is when you needed to use them as a string you had to call .value.value which is annoying, and if you forget to do it in string interpolation for example you get a bad result. There might be other options which don't have these issues

@@ -67,7 +67,7 @@ object S3FileOperations {
)
.compile
Copy link
Collaborator

@imsdu imsdu May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not this go away now ?
Did you try to test with a big-ish file ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the compile, test will arrive soon

@shinyhappydan shinyhappydan force-pushed the s3-download-failures-v2 branch from 5fb52fa to a37058e Compare May 3, 2024 14:31
@shinyhappydan shinyhappydan force-pushed the s3-download-failures-v2 branch from a37058e to 844d557 Compare May 3, 2024 14:45
@olivergrabinski olivergrabinski merged commit e41ea2f into BlueBrain:master May 6, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants