Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint lifetime fix #208

Closed
wants to merge 6 commits into from
Closed

Endpoint lifetime fix #208

wants to merge 6 commits into from

Conversation

TingDaoK
Copy link
Contributor

@TingDaoK TingDaoK commented Sep 2, 2022

Issue #, if available:

  • The release of the endpoint will lead to touch the hashtable owned by client, which needs to be protected by lock.
  • We have the issue that when the refcount of endpoint drops to zero, we need to grab the lock and remove the endpoint from hashtable. But there is a race between client grab the lock to find the endpoint from hashtable and release of endpoint to remove itself from hashtable.
  • Related issue [s3 endpoint] aws_s3_client_endpoint_release race condition produces segmentation faults #202

Description of changes:

  • We cannot always make sure the lock be grab to release the refcount. So that, we schedule a task to always be safe to grab lock and check the refcount and decide to remove it from hashtable or not

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@TingDaoK TingDaoK changed the title Endpoint Endpoint lifetime fix Sep 2, 2022
@TingDaoK TingDaoK marked this pull request as ready for review September 2, 2022 20:37
@grrtrr
Copy link
Contributor

grrtrr commented Sep 2, 2022

Thank you for providing a fix. We will look at this at a later stage, need to do some testing.

Comment on lines +725 to +730
struct aws_task *async_release_task = aws_mem_calloc(client->allocator, 1, sizeof(struct aws_task));
aws_task_init(async_release_task, s_endpoint_release_task, endpoint, "se_endpoint_refcount_task");

endpoint->async_release.client = client;
endpoint->async_release.task = async_release_task;
endpoint->async_release.task_scheduled = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: should this endpoint initialization code go in in s3_endpoint.c instead of s3_client.c?

struct aws_s3_client *client;
/* If set, the release of endpoint will be scheduled to run within the task to prevent the data race with client
*/
struct aws_task *task;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create a new task just for this? why not re-use the client's existing process_work_task?

we could keep an aws_array_list of struct aws_s3_endpoint* to be released

Comment on lines +589 to +593
size_t current_ref = aws_atomic_load_int(&endpoint->ref_count.ref_count);
num_released = endpoint->async_release.num_released;
endpoint->async_release.num_released = 0;
AWS_ASSERT(num_released <= current_ref);
if (num_released == current_ref) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count might have changed from another thread

/* The last refcount to release */
if (aws_atomic_load_int(&endpoint->ref_count.ref_count) == 1) {
aws_hash_table_remove(&client->synced_data.endpoints, endpoint->host_name, NULL, NULL);
if (endpoint->async_release.task) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of endpoint having 2 "modes", where 1 mode is never used

just needless complexity

@graebm
Copy link
Contributor

graebm commented Sep 9, 2022

I took another swing at fixing this over in a new PR: #209

@graebm graebm closed this Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants