Improve async write (fewer copies, polling API) #430

graebm · 2024-05-09T17:39:13Z

Issue:
Mountpoint's upload throughput took a 20% hit when they started using the new async write API (PR #418).

We knew the new API could, in the worst case, do an additional copy (see TODOs in PR description and code.). Some quick experimentation showed this was the cause.

Description of changes:

Remove additional copy.
- Async write gets buffers from the buffer-pool (made possible by PR Buffer-pool allows "forced" buffers, which don't count against memory limit #429)
Add new poll_write() function, which is simpler to use from Mountpoint.
- Mountpoint typically does 1MiB or 256KiB writes. So we pretty much always need to copy the data immediately. So let's optimize for that.
- Rust needed tricky code to cope with the original write() API's demand that data stay alive until the write-future completes. And aws-c-s3 needed tricky code to guarantee cancel() would synchronously fire any pending write-futures. If we just offer a rust-polling-style API that always copies synchronously, we can simplify a lot of code. So let's do that.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov-commenter · 2024-05-09T19:13:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.61%. Comparing base (3647b4b) to head (fed0631).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #430      +/-   ##
==========================================
+ Coverage   89.54%   89.61%   +0.06%     
==========================================
  Files          20       20              
  Lines        6036     6045       +9     
==========================================
+ Hits         5405     5417      +12     
+ Misses        631      628       -3

Files	Coverage Δ
source/s3_auto_ranged_put.c	`92.80% <100.00%> (+0.03%)`	⬆️
source/s3_meta_request.c	`93.47% <100.00%> (+0.33%)`	⬆️

DmitriyMusatkin · 2024-05-10T18:32:59Z

include/aws/s3/s3_client.h

+ *      The waker callback will be invoked when you can call poll_write() again.
+ *      Do not call poll_write() again before the waker is invoked.
+ *
+ * 2)   Else if `result.error_code != 0` then poll_write() did not succeed


are there any recoverable errors, or error means everything is on fire and the only way forward is to kill things?

"result.is_pending" is the only "recoverable" error, which is why I broke it out into its own variable, instead of it making it a special error code.

Full text is:

Else if result.error_code != 0 then poll_write() did not succeed
and you should not call it again. The meta request is guaranteed to finish soon
(you don't need to worry about canceling the meta request yourself after a failed write).
A common error code is AWS_ERROR_S3_REQUEST_HAS_COMPLETED, indicating
the meta request completed for reasons unrelated to the poll_write() call
(e.g. CreateMultipartUpload received a 403 Forbidden response).
AWS_ERROR_INVALID_STATE usually indicates that you're calling poll_write()
incorrectly (e.g. not waiting for waker callback from previous poll_write() call).

My personal philosophy on APIs where users need to call a sequence of functions correctly to complete the operation is: If anything goes wrong with a function in that sequence, the operation should cancel itself. We shouldn't require the user to inspect the result of every code, to determine whether or not they need to cancel the operation. It's a streaming operation, so random failure was always a possibility.

So users can write simple code like:

if meta_request.write() failed: # something went wrong, bail out of write loop, meta-request will finish soon return

Instead of:

if meta_request.write() failed: # something went wrong, be sure to cancel the meta-request # in case this is our fault, then bail out of write loop, meta-request will finish soon if error_code != AWS_ERROR_S3_REQUEST_HAS_COMPLETED: meta_request.cancel() return

DmitriyMusatkin · 2024-05-10T18:38:41Z

include/aws/s3/s3_client.h

+ *      AWS_ERROR_INVALID_STATE usually indicates that you're calling poll_write()
+ *      incorrectly (e.g. not waiting for waker callback from previous poll_write() call).
+ *
+ * 3)   Else `result.bytes_processed` tells you how much data was processed.


calls to write in this case can be done one after the other, without waiting for waker or any sort of backoff?

yes. This works well for mountpoint because, if the user does a big write, the FUSE connector will just repeatedly feed Mountpoint 1MiB at a time, which they can keep passing to aws-c-s3 until the part is full.

It's not till they try and write the 9th MiB that they'll get result.is_pending

DmitriyMusatkin · 2024-05-10T18:49:31Z

include/aws/s3/s3_client.h

+ *
+ * @param user_data     Pointer to be passed to the waker callback.
+ *
+ * WARNING: This feature is experimental.


side question: should we consider something like experimental annotation that warns people when they use it? openssl has a deprecated flag and users must opt in if they want to avoid warning from those functions

🤷‍♀️

source/s3_auto_ranged_put.c

source/s3_meta_request.c

graebm added 6 commits May 9, 2024 09:42

async write via polling, instead of completion callbacks

77fb84f

a bit of cleanup

d29f99f

Use forced-buffers

5e8a58f

polish

841f42c

trivial

062cb1b

ultratrivial

cb6e80b

graebm changed the title ~~Asyncwrite poll~~ Imrove async write (fewer copies, polling API) May 9, 2024

graebm changed the title ~~Imrove async write (fewer copies, polling API)~~ Improve async write (fewer copies, polling API) May 9, 2024

client may be NULL during meta_request_destroy()

086aa48

graebm added 3 commits May 9, 2024 14:03

comment tweaks

baf1197

tweak comments & logs

58b6433

clang-format

fed0631

passaro mentioned this pull request May 10, 2024

Adopt polling API for uploading data in PutObject requests awslabs/mountpoint-s3#874

Merged

DmitriyMusatkin reviewed May 10, 2024

View reviewed changes

DmitriyMusatkin approved these changes May 10, 2024

View reviewed changes

source/s3_meta_request.c Outdated Show resolved Hide resolved

tweak some asserts

bf3e420

graebm merged commit 774999f into main May 13, 2024
30 checks passed

graebm deleted the asyncwrite-poll branch May 13, 2024 15:38

passaro mentioned this pull request May 13, 2024

Update CRT submodules to latest releases awslabs/mountpoint-s3#877

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve async write (fewer copies, polling API) #430

Improve async write (fewer copies, polling API) #430

graebm commented May 9, 2024 •

edited

Loading

codecov-commenter commented May 9, 2024 •

edited

Loading

DmitriyMusatkin May 10, 2024

graebm May 10, 2024 •

edited

Loading

DmitriyMusatkin May 10, 2024

graebm May 10, 2024 •

edited

Loading

DmitriyMusatkin May 10, 2024

graebm May 10, 2024

Improve async write (fewer copies, polling API) #430

Improve async write (fewer copies, polling API) #430

Conversation

graebm commented May 9, 2024 • edited Loading

codecov-commenter commented May 9, 2024 • edited Loading

Codecov Report

DmitriyMusatkin May 10, 2024

Choose a reason for hiding this comment

graebm May 10, 2024 • edited Loading

Choose a reason for hiding this comment

DmitriyMusatkin May 10, 2024

Choose a reason for hiding this comment

graebm May 10, 2024 • edited Loading

Choose a reason for hiding this comment

DmitriyMusatkin May 10, 2024

Choose a reason for hiding this comment

graebm May 10, 2024

Choose a reason for hiding this comment

graebm commented May 9, 2024 •

edited

Loading

codecov-commenter commented May 9, 2024 •

edited

Loading

graebm May 10, 2024 •

edited

Loading

graebm May 10, 2024 •

edited

Loading