Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mem limiter #368

Merged
merged 79 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
1e44954
initial buffer reuse
DmitriyMusatkin Nov 6, 2023
0f59f52
test fixes
DmitriyMusatkin Nov 6, 2023
511ce35
more test fixes
DmitriyMusatkin Nov 6, 2023
a524b08
lets not be too fancy
DmitriyMusatkin Nov 6, 2023
5804c39
enable for puts
DmitriyMusatkin Nov 6, 2023
bd17c6d
bump limits
DmitriyMusatkin Nov 6, 2023
690fc85
dont reinit buffer
DmitriyMusatkin Nov 6, 2023
be631c6
fixes
DmitriyMusatkin Nov 6, 2023
d1df7ce
remove logging
DmitriyMusatkin Nov 6, 2023
bc9b126
cleaning up
DmitriyMusatkin Nov 9, 2023
891cd90
test fixes
DmitriyMusatkin Nov 9, 2023
c695744
32 bit fix
DmitriyMusatkin Nov 9, 2023
6c9e6ae
test fixes
DmitriyMusatkin Nov 9, 2023
e7166e9
fix small buffer for gets
DmitriyMusatkin Nov 9, 2023
9c40011
dont cancel trim
DmitriyMusatkin Nov 9, 2023
5207457
move around trim canceling
DmitriyMusatkin Nov 10, 2023
6f7286c
typo
DmitriyMusatkin Nov 10, 2023
22f38eb
lets check metrics inside synced block
DmitriyMusatkin Nov 10, 2023
43f12e6
data race
DmitriyMusatkin Nov 10, 2023
3d467d8
addressing comments
DmitriyMusatkin Nov 10, 2023
a7b9e7f
logging
DmitriyMusatkin Nov 10, 2023
ca9e597
low mem limits on 32
DmitriyMusatkin Nov 11, 2023
b0a9d83
typo
DmitriyMusatkin Nov 11, 2023
d9197dc
add more logging
DmitriyMusatkin Nov 13, 2023
3d755bd
build warning
DmitriyMusatkin Nov 13, 2023
02bae74
comment out correctly
DmitriyMusatkin Nov 13, 2023
dc9373e
comment out unused
DmitriyMusatkin Nov 13, 2023
c62cec1
more logging
DmitriyMusatkin Nov 13, 2023
607e28e
correct specifier
DmitriyMusatkin Nov 13, 2023
1c3345f
fix default mem config
DmitriyMusatkin Nov 13, 2023
36ad5c0
fixup mem usage stats
DmitriyMusatkin Nov 13, 2023
7ca1995
more logging
DmitriyMusatkin Nov 13, 2023
20f5c35
more logging
DmitriyMusatkin Nov 13, 2023
c5991e3
telemetry callback logs
DmitriyMusatkin Nov 13, 2023
7b98645
remove trim cancelling
DmitriyMusatkin Nov 13, 2023
df65c14
scheduling change
DmitriyMusatkin Nov 13, 2023
bb3c04a
switch over to reserving
DmitriyMusatkin Nov 14, 2023
3228670
unused params
DmitriyMusatkin Nov 14, 2023
6d68522
test fixes
DmitriyMusatkin Nov 14, 2023
cfc5f47
fix tests
DmitriyMusatkin Nov 14, 2023
b1f372f
remove assert
DmitriyMusatkin Nov 14, 2023
834f629
remove log
DmitriyMusatkin Nov 14, 2023
6d427a3
tweak reserve algo
DmitriyMusatkin Nov 15, 2023
8a7c617
fix
DmitriyMusatkin Nov 15, 2023
efef34d
docs
DmitriyMusatkin Nov 16, 2023
8e7f0db
Merge branch 'main' into mem_ticket
DmitriyMusatkin Nov 16, 2023
6a6fa88
fix test
DmitriyMusatkin Nov 16, 2023
b32a2fb
move test back
DmitriyMusatkin Nov 16, 2023
59e58fa
addressing comments
DmitriyMusatkin Nov 16, 2023
6617ab1
fix block size check
DmitriyMusatkin Nov 17, 2023
a94d0eb
fix buf limits test
DmitriyMusatkin Nov 17, 2023
9d783fd
add logging for debug purposes
DmitriyMusatkin Nov 17, 2023
7fd8358
more logs
DmitriyMusatkin Nov 17, 2023
4078601
moar logs
DmitriyMusatkin Nov 17, 2023
30b3069
telemetry callback makes no sense
DmitriyMusatkin Nov 17, 2023
3cfceb7
lets wait for meta req to shutdown
DmitriyMusatkin Nov 17, 2023
cb943cd
addressing comments
DmitriyMusatkin Nov 18, 2023
4f1488b
address comments
DmitriyMusatkin Nov 19, 2023
b450054
fix test
DmitriyMusatkin Nov 19, 2023
db19f07
another test
DmitriyMusatkin Nov 19, 2023
f3e896a
data race
DmitriyMusatkin Nov 19, 2023
1a8d329
data race
DmitriyMusatkin Nov 19, 2023
c600680
reenable trim
DmitriyMusatkin Nov 20, 2023
849d655
tweak buffer
DmitriyMusatkin Nov 20, 2023
e0b0ab7
fix build error
DmitriyMusatkin Nov 20, 2023
7f14806
lint and update docs
DmitriyMusatkin Nov 20, 2023
3739d1c
more lint
DmitriyMusatkin Nov 20, 2023
00cc3c1
adjust validation
DmitriyMusatkin Nov 20, 2023
ed5baa3
trim test
DmitriyMusatkin Nov 20, 2023
ab23ab6
lint
DmitriyMusatkin Nov 20, 2023
8fd62e4
net test case
DmitriyMusatkin Nov 20, 2023
52ffae7
Update source/s3_buffer_pool.c
DmitriyMusatkin Nov 21, 2023
fccca7c
addressing comments
DmitriyMusatkin Nov 21, 2023
1630751
addressing comments
DmitriyMusatkin Nov 21, 2023
0028246
lint, fix docs
DmitriyMusatkin Nov 21, 2023
fb3e351
even more lint
DmitriyMusatkin Nov 21, 2023
79977a5
address comments
DmitriyMusatkin Nov 21, 2023
bb5c139
remove 0 size buffer test
DmitriyMusatkin Nov 21, 2023
a6c69c2
typo
DmitriyMusatkin Nov 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/memory_aware_request_execution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
CRT S3 client was designed with throughput as a primary goal. As such, the client
scales resource usage, such as number of parallel requests in flight, to achieve
target throughput. The client creates buffers to hold data it is sending or
receiving for each request and scaling requests in flight has direct impact on
memory used. In practice, setting high target throughput or larger part size can
lead to high observed memory usage.

To mitigate high memory usages, memory reuse improvements were recently added to
the client along with options to limit max memory used. The following sections
will go into more detail on aspects of those changes and how the affect the
client.

### Memory Reuse
At the basic level, CRT S3 client starts with a meta request for operation like
put or get, breaks it into smaller part-sized requests and executes those in
parallel. CRT S3 client used to allocate part sized buffer for each of those
requests and release it right after the request was done. That approach,
resulted in a lot of very short lived allocations and allocator thrashing,
overall leading to memory use spikes considerably higher than whats needed. To
address that, the client is switching to a pooled buffer approach, discussed
below.

Note: approach described below is work in progress and concentrates on improving
the common cases (default 8mb part sizes and part sizes smaller than 64mb).

Several observations about the client usage of buffers:
- Client does not automatically switch to buffers above default 8mb for upload, until
upload passes 10,000 parts (~80 GB).
- Get operations always use either the configured part size or default of 8mb.
Part size for get is not adjusted, since there is no 10,000 part limitation.
- Both Put and Get operations go through fill and drain phases. Ex. for Put, the
client first schedules a number of reads to 'fil' the buffers from the source
and as those reads complete, the buffer are send over to the networking layer
are 'drained'
- individual uploadParts or ranged gets operations typically have a similar
lifespan (with some caveats). in practice part buffers are acquired/released
in bulk at the same time

The buffer pooling takes advantage of some of those allocation patterns and
works as follows.
The memory is split into primary and secondary areas. Secondary area is used for
requests with part size bigger than a predefined value (currently 4 times part size)
allocations from it got directly to allocator and are effectively old way of
doing things.

Primary memory area is split into blocks of fixed size (part size if defined or
8mb if not times 16). Blocks are allocated on demand. Each block is logically
subdivided into part sized chunks. Pool allocates and releases in chunk sizes
only, and supports acquiring several chunks (up to 4) at once.

Blocks are kept around while there are ongoing requests and are released async,
when there is low pressure on memory.

### Scheduling
Running out of memory is a terminal condition within CRT and in general its not
practical to try to set overall memory limit on all allocations, since it
dramatically increases the complexity of the code that deals with cases where
only part of a memory was allocated for a task.

Comparatively, majority of memory usage within S3 Client comes from buffers
allocated for Put/Get parts. So to control memory usage, the client will
concentrate on controlling the number of buffers allocated. Effectively, this
boils down to a back pressure mechanism of limiting the number of parts
scheduled as memory gets closer to the limit. Memory used for other resources,
ex. http connections data, various supporting structures, are not actively
controlled and instead some memory is taken out from overall limit.

Overall, scheduling does a best-effort memory limiting. At the time of
scheduling, the client reserves memory by using buffer pool ticketing mechanism.
Buffer is acquired from the pool using the ticket as close to the usage as
possible (this approach peaks at lower mem usage than preallocating all mem
upfront because buffers cannot be used right away, ex reading from file will
fill buffers slower than they are sent, leading to decent amount of buffer reuse)
Reservation mechanism is approximate and in some cases can lead to actual memory
usage being higher once tickets are redeemed. The client reserves some memory to
mitigate overflows like that.
133 changes: 133 additions & 0 deletions include/aws/s3/private/s3_buffer_pool.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#ifndef AWS_S3_BUFFER_ALLOCATOR_H
#define AWS_S3_BUFFER_ALLOCATOR_H

/**
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
* SPDX-License-Identifier: Apache-2.0.
*/

#include <aws/s3/s3.h>

/*
* S3 buffer pool.
* Buffer pool used for pooling part sized buffers for Put/Get operations.
* Provides additional functionally for limiting overall memory used.
* High-level buffer pool usage flow:
* - Create buffer with overall memory limit and common buffer size, aka chunk
* size (typically part size configured on client)
* - For each request:
* -- call reserve to acquire ticket for future buffer acquisition. this will
* mark memory reserved, but would not allocate it. if reserve call hits
* memory limit, it fails and reservation hold is put on the whole buffer
* pool. (aws_s3_buffer_pool_remove_reservation_hold can be used to remove
* reservation hold).
* -- once request needs memory, it can exchange ticket for a buffer using
* aws_s3_buffer_pool_acquire_buffer. this operation never fails, even if it
* ends up going over memory limit.
* -- buffer lifetime is tied to the ticket. so once request is done with the
* buffer, ticket is released and buffer returns back to the pool.
*/

AWS_EXTERN_C_BEGIN

struct aws_s3_buffer_pool;
struct aws_s3_buffer_pool_ticket;

struct aws_s3_buffer_pool_usage_stats {
/* Effective Max memory limit. Memory limit value provided during construction minus
* buffer reserved for overhead of the pool */
size_t mem_limit;

/* How much mem is used in primary storage. includes memory used by blocks
* that are waiting on all allocs to release before being put back in circulation. */
size_t primary_used;
/* Overall memory allocated for blocks. */
size_t primary_allocated;
/* Reserved memory. Does not account for how that memory will map into
* blocks and in practice can be lower than used memory. */
size_t primary_reserved;
/* Number of blocks allocated in primary. */
size_t primary_num_blocks;

/* Secondary mem used. Accurate, maps directly to base allocator. */
size_t secondary_used;
/* Secondary mem reserved. Accurate, maps directly to base allocator. */
size_t secondary_reserved;
};

/*
* Create new buffer pool.
* chunk_size - specifies the size of memory that will most commonly be acquired
* from the pool (typically part size).
* mem_limit - limit on how much mem buffer pool can use. once limit is hit,
* buffers can no longer be reserved from (reservation hold is placed on the pool).
* Returns buffer pool pointer on success and NULL on failure.
*/
AWS_S3_API struct aws_s3_buffer_pool *aws_s3_buffer_pool_new(
struct aws_allocator *allocator,
size_t chunk_size,
size_t mem_limit);

/*
* Destroys buffer pool.
* Does nothing if buffer_pool is NULL.
*/
AWS_S3_API void aws_s3_buffer_pool_destroy(struct aws_s3_buffer_pool *buffer_pool);

/*
* Reserves memory from the pool for later use.
* Best effort and can potentially reserve memory slightly over the limit.
* Reservation takes some memory out of the available pool, but does not
* allocate it right away.
* On success ticket will be returned.
* On failure NULL is returned, error is raised and reservation hold is placed
* on the buffer. Any further reservations while hold is active will fail.
* Remove reservation hold to unblock reservations.
*/
AWS_S3_API struct aws_s3_buffer_pool_ticket *aws_s3_buffer_pool_reserve(
struct aws_s3_buffer_pool *buffer_pool,
size_t size);

/*
* Whether pool has a reservation hold.
*/
AWS_S3_API bool aws_s3_buffer_pool_has_reservation_hold(struct aws_s3_buffer_pool *buffer_pool);

/*
* Remove reservation hold on pool.
*/
AWS_S3_API void aws_s3_buffer_pool_remove_reservation_hold(struct aws_s3_buffer_pool *buffer_pool);

/*
* Trades in the ticket for a buffer.
* Cannot fail and can over allocate above mem limit if reservation was not accurate.
* Using the same ticket twice will return the same buffer.
* Buffer is only valid until the ticket is released.
*/
AWS_S3_API struct aws_byte_buf aws_s3_buffer_pool_acquire_buffer(
struct aws_s3_buffer_pool *buffer_pool,
struct aws_s3_buffer_pool_ticket *ticket);

/*
* Releases the ticket.
* Any buffers associated with the ticket are invalidated.
*/
AWS_S3_API void aws_s3_buffer_pool_release_ticket(
struct aws_s3_buffer_pool *buffer_pool,
struct aws_s3_buffer_pool_ticket *ticket);

/*
* Get pool memory usage stats.
*/
AWS_S3_API struct aws_s3_buffer_pool_usage_stats aws_s3_buffer_pool_get_usage(struct aws_s3_buffer_pool *buffer_pool);

/*
* Trims all unused mem from the pool.
* Warning: fairly slow operation, do not use in critical path.
* TODO: partial trimming? ex. only trim down to 50% of max?
*/
AWS_S3_API void aws_s3_buffer_pool_trim(struct aws_s3_buffer_pool *buffer_pool);

AWS_EXTERN_C_END

#endif /* AWS_S3_BUFFER_ALLOCATOR_H */
8 changes: 8 additions & 0 deletions include/aws/s3/private/s3_client_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,8 @@ struct aws_s3_upload_part_timeout_stats {
struct aws_s3_client {
struct aws_allocator *allocator;

struct aws_s3_buffer_pool *buffer_pool;

struct aws_s3_client_vtable *vtable;

struct aws_ref_count ref_count;
Expand Down Expand Up @@ -340,6 +342,9 @@ struct aws_s3_client {
/* Task for processing requests from meta requests on connections. */
struct aws_task process_work_task;

/* Task for trimming buffer bool. */
struct aws_task trim_buffer_pool_task;

/* Number of endpoints currently allocated. Used during clean up to know how many endpoints are still in
* memory.*/
uint32_t num_endpoints_allocated;
Expand Down Expand Up @@ -378,6 +383,9 @@ struct aws_s3_client {

/* Number of requests currently being prepared. */
uint32_t num_requests_being_prepared;

/* Whether or not work processing is currently scheduled. */
uint32_t trim_buffer_pool_task_scheduled : 1;
} threaded_data;
};

Expand Down
9 changes: 8 additions & 1 deletion include/aws/s3/private/s3_request.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <aws/common/thread.h>
#include <aws/s3/s3.h>

#include <aws/s3/private/s3_buffer_pool.h>
#include <aws/s3/private/s3_checksums.h>

struct aws_http_message;
Expand All @@ -22,6 +23,7 @@ enum aws_s3_request_flags {
AWS_S3_REQUEST_FLAG_RECORD_RESPONSE_HEADERS = 0x00000001,
AWS_S3_REQUEST_FLAG_PART_SIZE_RESPONSE_BODY = 0x00000002,
AWS_S3_REQUEST_FLAG_ALWAYS_SEND = 0x00000004,
AWS_S3_REQUEST_FLAG_PART_SIZE_REQUEST_BODY = 0x00000008,
};

/**
Expand Down Expand Up @@ -112,6 +114,8 @@ struct aws_s3_request {
* retried.*/
struct aws_byte_buf request_body;

struct aws_s3_buffer_pool_ticket *ticket;

/* Beginning range of this part. */
/* TODO currently only used by auto_range_get, could be hooked up to auto_range_put as well. */
uint64_t part_range_start;
Expand Down Expand Up @@ -184,7 +188,10 @@ struct aws_s3_request {
uint32_t record_response_headers : 1;

/* When true, the response body buffer will be allocated in the size of a part. */
uint32_t part_size_response_body : 1;
uint32_t has_part_size_response_body : 1;

/* When true, the request body buffer will be allocated in the size of a part. */
uint32_t has_part_size_request_body : 1;

/* When true, this request is being tracked by the client for limiting the amount of in-flight-requests/stats. */
uint32_t tracked_by_client : 1;
Expand Down
1 change: 1 addition & 0 deletions include/aws/s3/s3.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ enum aws_s3_errors {
AWS_ERROR_S3_INCORRECT_CONTENT_LENGTH,
AWS_ERROR_S3_REQUEST_TIME_TOO_SKEWED,
AWS_ERROR_S3_FILE_MODIFIED,
AWS_ERROR_S3_EXCEEDS_MEMORY_LIMIT,
AWS_ERROR_S3_END_RANGE = AWS_ERROR_ENUM_END_RANGE(AWS_C_S3_PACKAGE_ID)
};

Expand Down
3 changes: 3 additions & 0 deletions include/aws/s3/s3_client.h
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,9 @@ struct aws_s3_client_config {
/* Throughput target in Gbps that we are trying to reach. */
double throughput_target_gbps;

/* How much memory can we use. */
size_t memory_limit_in_bytes;

/* Retry strategy to use. If NULL, a default retry strategy will be used. */
struct aws_retry_strategy *retry_strategy;

Expand Down
1 change: 1 addition & 0 deletions source/s3.c
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ static struct aws_error_info s_errors[] = {
AWS_DEFINE_ERROR_INFO_S3(AWS_ERROR_S3_INCORRECT_CONTENT_LENGTH, "Request body length must match Content-Length header."),
AWS_DEFINE_ERROR_INFO_S3(AWS_ERROR_S3_REQUEST_TIME_TOO_SKEWED, "RequestTimeTooSkewed error received from S3."),
AWS_DEFINE_ERROR_INFO_S3(AWS_ERROR_S3_FILE_MODIFIED, "The file was modified during upload."),
AWS_DEFINE_ERROR_INFO_S3(AWS_ERROR_S3_EXCEEDS_MEMORY_LIMIT, "Request was not created due to used memory exceeding memory limit."),
};
/* clang-format on */

Expand Down
25 changes: 22 additions & 3 deletions source/s3_auto_ranged_get.c
Original file line number Diff line number Diff line change
Expand Up @@ -177,13 +177,21 @@ static bool s_s3_auto_ranged_get_update(
meta_request,
AWS_S3_AUTO_RANGE_GET_REQUEST_TYPE_HEAD_OBJECT,
0,
AWS_S3_REQUEST_FLAG_RECORD_RESPONSE_HEADERS | AWS_S3_REQUEST_FLAG_PART_SIZE_RESPONSE_BODY);
AWS_S3_REQUEST_FLAG_RECORD_RESPONSE_HEADERS);

request->discovers_object_size = true;

auto_ranged_get->synced_data.head_object_sent = true;
}
} else if (auto_ranged_get->synced_data.num_parts_requested == 0) {

struct aws_s3_buffer_pool_ticket *ticket =
aws_s3_buffer_pool_reserve(meta_request->client->buffer_pool, meta_request->part_size);

if (ticket == NULL) {
goto has_work_remaining;
}

/* If we aren't using a head object, then discover the size of the object while trying to get the
* first part. */
request = aws_s3_request_new(
Expand All @@ -192,6 +200,7 @@ static bool s_s3_auto_ranged_get_update(
1,
AWS_S3_REQUEST_FLAG_RECORD_RESPONSE_HEADERS | AWS_S3_REQUEST_FLAG_PART_SIZE_RESPONSE_BODY);

request->ticket = ticket;
request->part_range_start = 0;
request->part_range_end = meta_request->part_size - 1; /* range-end is inclusive */
request->discovers_object_size = true;
Expand Down Expand Up @@ -253,12 +262,21 @@ static bool s_s3_auto_ranged_get_update(
auto_ranged_get->synced_data.read_window_warning_issued = 0;
}

struct aws_s3_buffer_pool_ticket *ticket =
aws_s3_buffer_pool_reserve(meta_request->client->buffer_pool, meta_request->part_size);

if (ticket == NULL) {
goto has_work_remaining;
}

request = aws_s3_request_new(
meta_request,
AWS_S3_AUTO_RANGE_GET_REQUEST_TYPE_PART,
auto_ranged_get->synced_data.num_parts_requested + 1,
AWS_S3_REQUEST_FLAG_PART_SIZE_RESPONSE_BODY);

request->ticket = ticket;

aws_s3_get_part_range(
auto_ranged_get->synced_data.object_range_start,
auto_ranged_get->synced_data.object_range_end,
Expand Down Expand Up @@ -412,10 +430,11 @@ static struct aws_future_void *s_s3_auto_ranged_get_prepare_request(struct aws_s
/* Success! */
AWS_LOGF_DEBUG(
AWS_LS_S3_META_REQUEST,
"id=%p: Created request %p for part %d",
"id=%p: Created request %p for part %d part sized %d",
(void *)meta_request,
(void *)request,
request->part_number);
request->part_number,
request->has_part_size_response_body);

success = true;

Expand Down
Loading