-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop limiting num-connections based on num-known-IPs (Improve S3-Express performance) #407
Conversation
…ould get us results closest to old algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DNS Load Balancing: DNS resolver continuously harvests Amazon S3 IP addresses. When load is spread across the S3 fleet, overall throughput is better than if all connections were hammering the same IP simultaneously.
Should we update the readme here?
@@ -151,32 +152,9 @@ uint32_t aws_s3_client_get_max_active_connections( | |||
struct aws_s3_client *client, | |||
struct aws_s3_meta_request *meta_request) { | |||
AWS_PRECONDITION(client); | |||
(void)meta_request; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just remove meta_request
from the args, it's a private helper.
Or do we even need this helper? Can we just have it as a member of client? It won't change after the client initialized now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we'll be using meta_request again in the near future. If we're going to use different magic numbers for S3-Express vs S3-Vanilla, there will be some meta-request->storage_class
variable that we'll need to check
@@ -1687,7 +1669,7 @@ static bool s_s3_client_should_update_meta_request( | |||
size_t num_known_vips = client->vtable->get_host_address_count( | |||
client->client_bootstrap->host_resolver, endpoint->host_name, AWS_GET_HOST_ADDRESS_COUNT_RECORD_TYPE_A); | |||
if (num_known_vips == 0 && (client->threaded_data.num_requests_being_prepared + | |||
client->threaded_data.request_queue_size) >= g_max_num_connections_per_vip) { | |||
client->threaded_data.request_queue_size) >= g_min_num_connections) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know yes/no for certain. I wanted to limit the extent of these changes so I'm keeping the old behavior where we limit the number of requests in the prep stage when we don't know any IP addresses for its endpoint yet.
I also wondered if this was added in response to a specific issue, or just something that was added superstitiously. The check was added in this PR: Multiple Bucket Support. There's a specific commit (Preventing too many requests from being prepared) where Ryan adds that adds this check. So it seems pretty deliberate.
So maybe this could be removed? Or the numbers tweaked? We'll address that another day
…d. My tests showed similar performance for 1 vs 100 IPs. But with 1 IP the results were definitely more variable (some IPs were slower/faster than others)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #407 +/- ##
=======================================
Coverage 89.37% 89.37%
=======================================
Files 20 20
Lines 5863 5854 -9
=======================================
- Hits 5240 5232 -8
+ Misses 623 622 -1
|
Issue:
Disappointing S3-Express performance.
Description of changes:
Stop limiting num-connections based on num-known-IPs.
Diagnosing the issue:
We found that num-connections was never getting very high, because num-connections scales based on the num-known-IPs. S3-Express endpoints have very few IPs, so their num-connections weren't scaling very high.
The algorithm was adding 10 connections per known-IP. On a 100Gb/s machine, this maxed out at 250 connections once 25 IPs were known. But S3-Express endpoints only have 4 unique IPs, so they never got higher than 40 connections.
This algorithm was written back when S3 returned 1 IP per DNS query. The intention was to throttle connections until more IPs were known, in order to spread load among S3's server fleet. However, as of Aug 2023 S3 provides multiple IPs per DNS query. So now, we can scale up to max connections after the first DNS query and still be spreading load.
We also believed that spreading load was a key to good performance. But I found that spreading the load didn't have much impact on performance (at least now, in 2024, on the 100Gb/s machine I was using). Tests where I hard-coded a single IP and hit it with max-connections didn't differ much from tests where the load was spread among 8 IPs or 100 IPs.
I want to get this change out quickly and help S3-Express, so I picked magic numbers where the num-connections math ends up with the same result as the old algorithm. Normal S3 performance is mildly improved (max-connections is reached immediately, instead of scaling up over 30sec as it finds more IPs). S3 Express performance is MUCH improved.
Future Work:
Improve this algorithm further:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.