S3Store: Concurrently write upload parts to S3 while reading from client #402

acj · 2020-06-22T21:23:55Z

Overview

As discussed in #379, the throughput performance of S3Store can degrade when the S3 (or compatible) API has high or unpredictable latency. Currently, S3Store synchronously reads bytes from the client, writes them to disk, and then writes the file on disk to the S3 API. That approach is simple and reliable, but it blocks the client from transmitting any new bytes while tusd is writing the current part to S3. This blocking causes the client to transmit more slowly due to standard TCP behavior.

This PR introduces s3ChunkProducer, a struct that is responsible for consuming bytes from the source reader (connected to the client) and converting them into *os.File instances that are pushed into a channel. The original request goroutine then consumes files from this channel until it's empty (or there's an error), uploading each file as a new part in S3.

Decoupling reading client bytes from writing them into S3 means that the two processes happen in parallel, which should allow the client to upload at a consistent speed even if the S3 API is intermittently slow. Actual performance will depend on details of the deployment, e.g. whether the tusd server is near its S3 endpoint on the network, the available bandwidth between them, how much bandwidth the client has along the path to tusd, etc. I've introduced a MaxBufferedParts field on S3Store to allow the server to store multiple complete parts on disk before they're written into S3. If the client is able to upload to the tusd server faster than tusd can upload to S3, then a larger buffer size may be a good way to boost client upload performance.

Notes

Introducing parallelism in the WriteChunk workflow means that many of the S3Store tests run nondeterministically when uploading parts. To help with this, the first commit in this branch relaxes the use of gomock.InOrder so that we don't require a strict ordering of function calls -- but we still require each individual call to happen at some point. I tried to keep the strict ordering in places where it mattered. I'm not an expert on gomock, so please let me know if there's a better way to do this.

The consumer will drain the files channel and write all available parts into S3, even if the client has gone away. I think this is mostly a good thing, but it may be possible for a high-throughput client to upload several parts into the buffer and then get disconnected because of a network issue. If the server isn't using a locker and the client tries to resume uploading while tusd is still uploading those buffered parts, then we may run into a conflict. I think this is already a possibility on tusd master, but the part buffer could make such a scenario more likely.

To help tusd operators tune the size of the file channel buffer, I'd like to add a metric for how much time we spend blocking the client. Maybe this could be measured in the main WriteChunk for loop and reported at the end of an upload. I'm not sure how to pass the time data from the S3Store (which knows about part upload latency) back to the handler (which owns the Metrics struct), though. Open to suggestions.

This code has not received any production testing yet, so I would consider it unsafe until we're confident that the test coverage is sufficient and that there are no logic errors.

Acconut

Thank you very very much for this PR! It looks amazing and I can't wait to try it out!

Introducing parallelism in the WriteChunk workflow means that many of the S3Store tests run nondeterministically when uploading parts. To help with this, the first commit in this branch relaxes the use of gomock.InOrder so that we don't require a strict ordering of function calls -- but we still require each individual call to happen at some point. I tried to keep the strict ordering in places where it mattered. I'm not an expert on gomock, so please let me know if there's a better way to do this.

Sounds like a good decision. 👍

The consumer will drain the files channel and write all available parts into S3, even if the client has gone away. I think this is mostly a good thing, but it may be possible for a high-throughput client to upload several parts into the buffer and then get disconnected because of a network issue. If the server isn't using a locker and the client tries to resume uploading while tusd is still uploading those buffered parts, then we may run into a conflict. I think this is already a possibility on tusd master, but the part buffer could make such a scenario more likely.

If the server isn't using locks, then this is quite likely. However, if someone does not use locking then they are on their own in that regard and may have to live with the consequences.

To help tusd operators tune the size of the file channel buffer, I'd like to add a metric for how much time we spend blocking the client. Maybe this could be measured in the main WriteChunk for loop and reported at the end of an upload. I'm not sure how to pass the time data from the S3Store (which knows about part upload latency) back to the handler (which owns the Metrics struct), though. Open to suggestions.

That's a good idea! We had thoughts about storage-specific metrics (e.g. to measure latency to S3 or other services) but this would require some more work on how we gather metrics. Maybe that is better reserved for its own PR. What do you think?

This code has not received any production testing yet, so I would consider it unsafe until we're confident that the test coverage is sufficient and that there are no logic errors.

A very good place to gather first experience is the master.tus.io instance. Once this PR is merged, the code will be deployed there so we can test it extensively and see how it behaves before doing a release.

pkg/s3store/s3store.go

pkg/s3store/s3store_chunk_producer_test.go

acj · 2020-06-30T14:09:10Z

We had thoughts about storage-specific metrics (e.g. to measure latency to S3 or other services) but this would require some more work on how we gather metrics. Maybe that is better reserved for its own PR. What do you think?

👍

Acconut · 2020-07-01T17:45:51Z

Thanks for your feedback! Looking forward to your adjustments :)

This reverts commit b72a4d4.

This is already being done in s3PartProducer.

acj · 2020-07-14T21:32:25Z

Thanks for your patience. I'm still thinking about how/whether to test the file draining in d014a3e, but this felt like a good time to stop and gather more feedback. I should have time to do a bit of stress testing later this week.

Acconut

Wow, thank you very much, the new commits are looking fantastic!

I'm still thinking about how/whether to test the file draining in d014a3e, but this felt like a good time to stop and gather more feedback.

Yes, that's not an easy test. What I can think of is following:

Create a temporary directory
Configure the S3Store to use this temporary directory which is then passed to https://golang.org/pkg/io/ioutil/#TempFile (this struct field does not exist yet)
Push some data to the WriteChunk function
Make sure that there are files in the temporary directory
Simulate an error from AWS S3 (we can specify our own implementation and don't have to use gomock here)
Make sure that the temporary directory is empty after some time.

What do you think?

pkg/s3store/s3store.go

…into acj-s3store-buffered-chunks

Acconut · 2020-07-19T21:30:32Z

@acj If you want to, I can also take care of the last minor tasks to free you up and get this PR finished a bit sooner :)

pkg/s3store/s3store.go

acj · 2020-07-21T13:51:13Z

If you want to, I can also take care of the last minor tasks

Yes, much appreciated! My past few weeks have been unexpectedly busy. It sounds like you have a clear idea of what you want for that last test, so I'll leave it in your able hands.

I will be doing more stress testing and will post here if I find any issues.

Acconut · 2020-07-22T08:49:58Z

Yes, much appreciated! My past few weeks have been unexpectedly busy. It sounds like you have a clear idea of what you want for that last test, so I'll leave it in your able hands.

No worries, please make sure to not overwork yourself :) I will have a look at the tests.

I will be doing more stress testing and will post here if I find any issues.

I also did some testing and your PR improve the upload speed from 90Mbit/s to 150Mbit/s! With a bigger part size (50MB instead of 5MB), I even got up to 250MBit/s. So that's likely the next step in terms of optimizations besides parallel uploading of parts to S3. Very impressive work here!

…into acj-s3store-buffered-chunks

Acconut · 2020-07-22T09:59:43Z

@acj I just pushed the test. Please have a look at it and let me know what you think!

acj

The test and cleanups LGTM. Thanks for your help and suggestions!

I also did some testing and your PR improve the upload speed from 90Mbit/s to 150Mbit/s! With a bigger part size (50MB instead of 5MB), I even got up to 250MBit/s. So that's likely the next step in terms of optimizations besides parallel uploading of parts to S3. Very impressive work here!

Very happy to hear that. I've seen similar results during my testing in us-east-1 and haven't run into any hung uploads or other issues so far (but will continue testing and moving towards production). Using larger parts when possible is a good move, imo, and something that we'd like to experiment with after the dust settles from this PR.

acj · 2020-07-22T13:09:43Z

pkg/s3store/s3store_test.go

+				Size: aws.Int64(200),
+			},
+		},
+	}, nil).Times(2)


.Times(2)

👍

It's out of scope here, but I think there are a number of places where we could combine duplicate EXPECT() calls now that the call ordering is less strict.

acj · 2020-07-22T13:10:11Z

pkg/s3store/s3store.go

@@ -334,8 +340,7 @@ func (spp *s3PartProducer) nextPart(size int64) (*os.File, error) {
 	// io.Copy returns 0 since it is unable to read any bytes. In that
 	// case, we can close the s3PartProducer.
 	if n == 0 {
-		os.Remove(file.Name())


Thanks for catching these 👍

Acconut

Thank you very, very much for the help, @acj!

…ent (tus#402) * Allow empty metadata values * Make tests less fragile by allowing loose call ordering * Add s3ChunkProducer * Integrate s3ChunkProducer to support chunk buffering * Remove completed chunk files inline to reduce disk space usage * Add tests for chunk producer * docs: Use value from Host header to forward to tusd * Use int64 for MaxBufferedParts field * Default to 20 buffered parts * Rename s3ChunkProducer -> s3PartProducer * Document s3PartProducer struct * Clarify misleading comment * Revert "Remove completed chunk files inline to reduce disk space usage" This reverts commit b72a4d4. * Remove redundant seek This is already being done in s3PartProducer. * Clean up any remaining files in the channel when we return * Make putPart* functions responsible for cleaning up temp files * handler: Add tests for empty metadata pairs * Factor out cleanUpTempFile func * Add test to ensure that temporary files get cleaned up Co-authored-by: Jens Steinhauser <jens.steinhauser@gmail.com> Co-authored-by: Marius <marius@transloadit.com>

JenSte and others added 6 commits June 20, 2020 13:24

Allow empty metadata values

87045a8

Make tests less fragile by allowing loose call ordering

6c45ff2

Add s3ChunkProducer

2b3fb49

Integrate s3ChunkProducer to support chunk buffering

54819d8

Remove completed chunk files inline to reduce disk space usage

b72a4d4

Add tests for chunk producer

8d8046c

Acconut reviewed Jun 24, 2020

View reviewed changes

docs: Use value from Host header to forward to tusd

ec5f500

acj added 9 commits July 14, 2020 09:58

Use int64 for MaxBufferedParts field

08a72a5

Default to 20 buffered parts

c51afa1

Rename s3ChunkProducer -> s3PartProducer

ff63ab4

Document s3PartProducer struct

46e0e9c

Clarify misleading comment

5dd7c3b

Revert "Remove completed chunk files inline to reduce disk space usage"

be6cf54

This reverts commit b72a4d4.

Remove redundant seek

74c5c0c

This is already being done in s3PartProducer.

Clean up any remaining files in the channel when we return

d014a3e

Make putPart* functions responsible for cleaning up temp files

9cb1385

Acconut reviewed Jul 15, 2020

View reviewed changes

pkg/s3store/s3store.go Outdated Show resolved Hide resolved

Acconut added 3 commits July 15, 2020 17:27

Merge branch 'metadata' of https://github.com/JenSte/tusd

59c3d42

handler: Add tests for empty metadata pairs

26b84bc

Merge branch 's3store-buffered-chunks' of https://github.com/acj/tusd …

a23a1af

…into acj-s3store-buffered-chunks

Factor out cleanUpTempFile func

b79c64f

acj commented Jul 21, 2020

View reviewed changes

pkg/s3store/s3store.go Show resolved Hide resolved

Merge branch 's3store-buffered-chunks' of https://github.com/acj/tusd …

2cb30a4

…into acj-s3store-buffered-chunks

Add test to ensure that temporary files get cleaned up

6984744

acj commented Jul 22, 2020

View reviewed changes

acj marked this pull request as ready for review July 22, 2020 13:21

Acconut approved these changes Jul 29, 2020

View reviewed changes

Acconut merged commit 6662f43 into tus:master Jul 29, 2020

acj mentioned this pull request Jul 6, 2021

s3store: Performance improvements #478

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3Store: Concurrently write upload parts to S3 while reading from client #402

S3Store: Concurrently write upload parts to S3 while reading from client #402

acj commented Jun 22, 2020

Acconut left a comment

acj commented Jun 30, 2020

Acconut commented Jul 1, 2020

acj commented Jul 14, 2020

Acconut left a comment

Acconut commented Jul 19, 2020

acj commented Jul 21, 2020

Acconut commented Jul 22, 2020 •

edited

Loading

Acconut commented Jul 22, 2020

acj left a comment

acj Jul 22, 2020

acj Jul 22, 2020

Acconut left a comment

S3Store: Concurrently write upload parts to S3 while reading from client #402

S3Store: Concurrently write upload parts to S3 while reading from client #402

Conversation

acj commented Jun 22, 2020

Overview

Notes

Acconut left a comment

Choose a reason for hiding this comment

acj commented Jun 30, 2020

Acconut commented Jul 1, 2020

acj commented Jul 14, 2020

Acconut left a comment

Choose a reason for hiding this comment

Acconut commented Jul 19, 2020

acj commented Jul 21, 2020

Acconut commented Jul 22, 2020 • edited Loading

Acconut commented Jul 22, 2020

acj left a comment

Choose a reason for hiding this comment

acj Jul 22, 2020

Choose a reason for hiding this comment

acj Jul 22, 2020

Choose a reason for hiding this comment

Acconut left a comment

Choose a reason for hiding this comment

Acconut commented Jul 22, 2020 •

edited

Loading