[s3_endpoint]: avoid problems arising from keeping endpoint reference alive in hash table #205

grrtrr · 2022-08-23T17:17:57Z

This pull request fixes #202 and arose from comments in #203.

The client keeps references to s3_endpoints alive in a hash table. This requires complicated logic which is hard to
get right:

The first implementation in Multiple Bucket Support #136 introduced multiple callbacks.
This had a race condition, which was then addressed by Proposal to fix reuse endpoint with refcount can reach zero #183.
[s3 endpoint] aws_s3_client_endpoint_release race condition produces segmentation faults #202 found another race condition, but caused a deadlock, due to taking the synced_data mutex twice.

The conclusion is that the small gains in performance by hashing a frequently-used endpoint do not
justify the greatly increased complexity, which has caused race conditions, segmentation faults,
deadlocks - which are both difficult to debug and to get right.

Hence remove the hash table and use only one endpoint_release function.

Also fix the argument when calling s_s3_endpoint_http_connection_manager_shutdown_callback in
s_s3_endpoint_ref_count_zero (function requires an s3_endpoint).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

grrtrr · 2022-08-29T18:26:56Z

@TingDaoK - when you have time, could you please take a look?

TingDaoK · 2022-09-01T00:05:57Z

Thank you for the deep dive.

I agree the hash map increased the complexity quit a lot. But, I doubt that it's a small gains in performance.

The approach you used here will create an s3_endpoint for every meta request created. And I think each s3_endpoint is not cheap. It has the connection pool under the hood.
Ideally, one client should be used for a large number of requests.

I'll look into the issue and see if we have a better solution.

grrtrr · 2022-09-01T14:10:35Z

I'll look into the issue and see if we have a better solution.

Until there is a better solution, can we remove the old one, since it currently creates problems (#202)?

TingDaoK · 2022-09-02T20:40:40Z

#208 I created this PR for the fix. I think it's a more proper fix here.

grrtrr · 2022-09-05T21:54:37Z

#208 I created this PR for the fix. I think it's a more proper fix here.

Thank you. We need some time of testing before putting this into production.
I also thought whether the problem is architectural, i.e. whether a restructuring of the s3 client could make it simpler and more straightforward to retain references to s3 endpoints.

grrtrr · 2022-09-15T14:01:28Z

Has been replaced by #209 in v0.1.48.

grrtrr added 2 commits August 23, 2022 10:13

Remove hashtable entry in s_s3_client_endpoint_shutdown_callback

6c5c431

Do not use a separate hash table entry for the S3Endpoint

8352aa2

Merge branch 'main' into issue_202.revised

ef2d30e

grrtrr closed this Sep 15, 2022

grrtrr deleted the issue_202.revised branch September 15, 2022 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[s3_endpoint]: avoid problems arising from keeping endpoint reference alive in hash table #205

[s3_endpoint]: avoid problems arising from keeping endpoint reference alive in hash table #205

grrtrr commented Aug 23, 2022

grrtrr commented Aug 29, 2022

TingDaoK commented Sep 1, 2022 •

edited

Loading

grrtrr commented Sep 1, 2022

TingDaoK commented Sep 2, 2022

grrtrr commented Sep 5, 2022

grrtrr commented Sep 15, 2022

[s3_endpoint]: avoid problems arising from keeping endpoint reference alive in hash table #205

[s3_endpoint]: avoid problems arising from keeping endpoint reference alive in hash table #205

Conversation

grrtrr commented Aug 23, 2022

grrtrr commented Aug 29, 2022

TingDaoK commented Sep 1, 2022 • edited Loading

grrtrr commented Sep 1, 2022

TingDaoK commented Sep 2, 2022

grrtrr commented Sep 5, 2022

grrtrr commented Sep 15, 2022

TingDaoK commented Sep 1, 2022 •

edited

Loading