-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3Store: Concurrently write upload parts to S3 while reading from client #402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very very much for this PR! It looks amazing and I can't wait to try it out!
Introducing parallelism in the WriteChunk workflow means that many of the S3Store tests run nondeterministically when uploading parts. To help with this, the first commit in this branch relaxes the use of gomock.InOrder so that we don't require a strict ordering of function calls -- but we still require each individual call to happen at some point. I tried to keep the strict ordering in places where it mattered. I'm not an expert on gomock, so please let me know if there's a better way to do this.
Sounds like a good decision. 👍
The consumer will drain the files channel and write all available parts into S3, even if the client has gone away. I think this is mostly a good thing, but it may be possible for a high-throughput client to upload several parts into the buffer and then get disconnected because of a network issue. If the server isn't using a locker and the client tries to resume uploading while tusd is still uploading those buffered parts, then we may run into a conflict. I think this is already a possibility on tusd master, but the part buffer could make such a scenario more likely.
If the server isn't using locks, then this is quite likely. However, if someone does not use locking then they are on their own in that regard and may have to live with the consequences.
To help tusd operators tune the size of the file channel buffer, I'd like to add a metric for how much time we spend blocking the client. Maybe this could be measured in the main WriteChunk for loop and reported at the end of an upload. I'm not sure how to pass the time data from the S3Store (which knows about part upload latency) back to the handler (which owns the Metrics struct), though. Open to suggestions.
That's a good idea! We had thoughts about storage-specific metrics (e.g. to measure latency to S3 or other services) but this would require some more work on how we gather metrics. Maybe that is better reserved for its own PR. What do you think?
This code has not received any production testing yet, so I would consider it unsafe until we're confident that the test coverage is sufficient and that there are no logic errors.
A very good place to gather first experience is the master.tus.io instance. Once this PR is merged, the code will be deployed there so we can test it extensively and see how it behaves before doing a release.
👍 |
Thanks for your feedback! Looking forward to your adjustments :) |
This reverts commit b72a4d4.
This is already being done in s3PartProducer.
Thanks for your patience. I'm still thinking about how/whether to test the file draining in d014a3e, but this felt like a good time to stop and gather more feedback. I should have time to do a bit of stress testing later this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thank you very much, the new commits are looking fantastic!
I'm still thinking about how/whether to test the file draining in d014a3e, but this felt like a good time to stop and gather more feedback.
Yes, that's not an easy test. What I can think of is following:
- Create a temporary directory
- Configure the S3Store to use this temporary directory which is then passed to https://golang.org/pkg/io/ioutil/#TempFile (this struct field does not exist yet)
- Push some data to the WriteChunk function
- Make sure that there are files in the temporary directory
- Simulate an error from AWS S3 (we can specify our own implementation and don't have to use gomock here)
- Make sure that the temporary directory is empty after some time.
What do you think?
@acj If you want to, I can also take care of the last minor tasks to free you up and get this PR finished a bit sooner :) |
Yes, much appreciated! My past few weeks have been unexpectedly busy. It sounds like you have a clear idea of what you want for that last test, so I'll leave it in your able hands. I will be doing more stress testing and will post here if I find any issues. |
No worries, please make sure to not overwork yourself :) I will have a look at the tests.
I also did some testing and your PR improve the upload speed from 90Mbit/s to 150Mbit/s! With a bigger part size (50MB instead of 5MB), I even got up to 250MBit/s. So that's likely the next step in terms of optimizations besides parallel uploading of parts to S3. Very impressive work here! |
…into acj-s3store-buffered-chunks
@acj I just pushed the test. Please have a look at it and let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test and cleanups LGTM. Thanks for your help and suggestions!
I also did some testing and your PR improve the upload speed from 90Mbit/s to 150Mbit/s! With a bigger part size (50MB instead of 5MB), I even got up to 250MBit/s. So that's likely the next step in terms of optimizations besides parallel uploading of parts to S3. Very impressive work here!
Very happy to hear that. I've seen similar results during my testing in us-east-1 and haven't run into any hung uploads or other issues so far (but will continue testing and moving towards production). Using larger parts when possible is a good move, imo, and something that we'd like to experiment with after the dust settles from this PR.
Size: aws.Int64(200), | ||
}, | ||
}, | ||
}, nil).Times(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Times(2)
👍
It's out of scope here, but I think there are a number of places where we could combine duplicate EXPECT()
calls now that the call ordering is less strict.
pkg/s3store/s3store.go
Outdated
@@ -334,8 +340,7 @@ func (spp *s3PartProducer) nextPart(size int64) (*os.File, error) { | |||
// io.Copy returns 0 since it is unable to read any bytes. In that | |||
// case, we can close the s3PartProducer. | |||
if n == 0 { | |||
os.Remove(file.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching these 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very, very much for the help, @acj!
…ent (tus#402) * Allow empty metadata values * Make tests less fragile by allowing loose call ordering * Add s3ChunkProducer * Integrate s3ChunkProducer to support chunk buffering * Remove completed chunk files inline to reduce disk space usage * Add tests for chunk producer * docs: Use value from Host header to forward to tusd * Use int64 for MaxBufferedParts field * Default to 20 buffered parts * Rename s3ChunkProducer -> s3PartProducer * Document s3PartProducer struct * Clarify misleading comment * Revert "Remove completed chunk files inline to reduce disk space usage" This reverts commit b72a4d4. * Remove redundant seek This is already being done in s3PartProducer. * Clean up any remaining files in the channel when we return * Make putPart* functions responsible for cleaning up temp files * handler: Add tests for empty metadata pairs * Factor out cleanUpTempFile func * Add test to ensure that temporary files get cleaned up Co-authored-by: Jens Steinhauser <jens.steinhauser@gmail.com> Co-authored-by: Marius <marius@transloadit.com>
…ent (tus#402) * Allow empty metadata values * Make tests less fragile by allowing loose call ordering * Add s3ChunkProducer * Integrate s3ChunkProducer to support chunk buffering * Remove completed chunk files inline to reduce disk space usage * Add tests for chunk producer * docs: Use value from Host header to forward to tusd * Use int64 for MaxBufferedParts field * Default to 20 buffered parts * Rename s3ChunkProducer -> s3PartProducer * Document s3PartProducer struct * Clarify misleading comment * Revert "Remove completed chunk files inline to reduce disk space usage" This reverts commit b72a4d4. * Remove redundant seek This is already being done in s3PartProducer. * Clean up any remaining files in the channel when we return * Make putPart* functions responsible for cleaning up temp files * handler: Add tests for empty metadata pairs * Factor out cleanUpTempFile func * Add test to ensure that temporary files get cleaned up Co-authored-by: Jens Steinhauser <jens.steinhauser@gmail.com> Co-authored-by: Marius <marius@transloadit.com>
Overview
As discussed in #379, the throughput performance of S3Store can degrade when the S3 (or compatible) API has high or unpredictable latency. Currently, S3Store synchronously reads bytes from the client, writes them to disk, and then writes the file on disk to the S3 API. That approach is simple and reliable, but it blocks the client from transmitting any new bytes while tusd is writing the current part to S3. This blocking causes the client to transmit more slowly due to standard TCP behavior.
This PR introduces
s3ChunkProducer
, a struct that is responsible for consuming bytes from the source reader (connected to the client) and converting them into*os.File
instances that are pushed into a channel. The original request goroutine then consumes files from this channel until it's empty (or there's an error), uploading each file as a new part in S3.Decoupling reading client bytes from writing them into S3 means that the two processes happen in parallel, which should allow the client to upload at a consistent speed even if the S3 API is intermittently slow. Actual performance will depend on details of the deployment, e.g. whether the tusd server is near its S3 endpoint on the network, the available bandwidth between them, how much bandwidth the client has along the path to tusd, etc. I've introduced a
MaxBufferedParts
field on S3Store to allow the server to store multiple complete parts on disk before they're written into S3. If the client is able to upload to the tusd server faster than tusd can upload to S3, then a larger buffer size may be a good way to boost client upload performance.Notes
Introducing parallelism in the
WriteChunk
workflow means that many of the S3Store tests run nondeterministically when uploading parts. To help with this, the first commit in this branch relaxes the use ofgomock.InOrder
so that we don't require a strict ordering of function calls -- but we still require each individual call to happen at some point. I tried to keep the strict ordering in places where it mattered. I'm not an expert on gomock, so please let me know if there's a better way to do this.The consumer will drain the files channel and write all available parts into S3, even if the client has gone away. I think this is mostly a good thing, but it may be possible for a high-throughput client to upload several parts into the buffer and then get disconnected because of a network issue. If the server isn't using a locker and the client tries to resume uploading while tusd is still uploading those buffered parts, then we may run into a conflict. I think this is already a possibility on tusd master, but the part buffer could make such a scenario more likely.
To help tusd operators tune the size of the file channel buffer, I'd like to add a metric for how much time we spend blocking the client. Maybe this could be measured in the main WriteChunk for loop and reported at the end of an upload. I'm not sure how to pass the time data from the S3Store (which knows about part upload latency) back to the handler (which owns the Metrics struct), though. Open to suggestions.
This code has not received any production testing yet, so I would consider it unsafe until we're confident that the test coverage is sufficient and that there are no logic errors.