Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/7.0] Fix thread-safety issues with enumerating ResourceManager. #81283

Merged
merged 7 commits into from
Mar 10, 2023

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jan 27, 2023

Backport of #75054 to release/7.0

/cc @buyaa-n @madelson

Customer Impact

The regression introduced when the scope of the lock reduced from reader to cache and as @madelson mentioned the deadlock happening because:

A customer who is migrating from 3.1 to 6.0 is blocked by a deadlock issue that is a regression from 3.1 => 6.0. This fixed in .NET 8.0 but as .NET 3.1 is reached end of life more customers might be migrating from .NET 3.1 to 6.0 or 7.0. and encounter this issue.

There is a workaround to use a new ResourceManager instance when planning to call GetResourceSet in different threads rather than re-using a shared instance, but not desirable.

Testing

  • Unit testing added with the fix
  • Manual tested with the repro in the issue

Risk

Low, the fix is removing the outer lock from ResourceReader.ResourceEnumerator and instead adding a granular lock for each method that use the shared _store and misses the lock in the ResourceReader (previously some methods had lock, some not, for which the caller putting a lock to the reader). Also, it adds some Debug.Asserts to make the locking acquired appropriately and corresponding unit test that reproes the issue.

@ghost
Copy link

ghost commented Jan 27, 2023

Tagging subscribers to this area: @dotnet/area-system-resources
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #75054 to release/7.0

/cc @buyaa-n @madelson

Customer Impact

Testing

Risk

IMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.

Author: github-actions[bot]
Assignees: -
Labels:

area-System.Resources

Milestone: -

@buyaa-n buyaa-n added the Servicing-consider Issue for next servicing release review label Jan 30, 2023
@buyaa-n buyaa-n added this to the 7.0.x milestone Jan 30, 2023
@jeffhandley
Copy link
Member

I support the backport of this fix once the customer impact and risk sections are fully completed. @madelson had a lot of details in the related PR and issues explaining what leads to this deadlock, and this regression is impeding customers moving from an out-of-support LTS to an in-support LTS version.

@buyaa-n
Copy link
Member

buyaa-n commented Jan 31, 2023

I support the backport of this fix once the customer impact and risk sections are fully completed

Thanks, impact and risk sections are completed, just waiting on the customer testing result. I will check the failures later today

@buyaa-n buyaa-n removed the Servicing-consider Issue for next servicing release review label Jan 31, 2023
@madelson
Copy link
Contributor

this regression is impeding customers moving from an out-of-support LTS to an in-support LTS version

@jeffhandley given this, will this be considered for backport to .NET 6 or just 7?

@buyaa-n
Copy link
Member

buyaa-n commented Jan 31, 2023

given this, will this be considered for backport to .NET 6

Yes, here: #81281

@ericstj ericstj added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) Servicing-approved Approved for servicing release labels Jan 31, 2023
@ericstj
Copy link
Member

ericstj commented Jan 31, 2023

This is approved, contingent upon partner signoff that the fix resolves their issue. Remove the NO-MERGE once we have signoff from the impacted customer.

@buyaa-n
Copy link
Member

buyaa-n commented Jan 31, 2023

There is one failure that is related: https://github.com/dotnet/runtime/pull/81281/checks?check_run_id=10937934301 which is related to intermittent timeout, we need to add the fix for that, just pushed that commit

madelson and others added 6 commits February 10, 2023 14:27
Concurrently enumerating a ResourceManager while also calling GetString()
and similar methods was prone to both transient errors and deadlock.

The transient errors were caused by RuntimeResourceSet calling internal
methods on ResourceReader that did not properly lock. Now, all exposed
methods on the Reader are thread-safe.

The deadlock was called by inconsistent lock ordering between
ResourceReader.ResourceEnumerator and RuntimeResourceSet which both lock
on the RuntimeResourceSet's cache and on the ResourceReader itself. Now,
the enumerator does not need to take both locks at the same time.

Fix #74052
Fix #74868
@carlossanlop carlossanlop force-pushed the backport/pr-75054-to-release/7.0 branch from 1626ad8 to d57ec7e Compare February 10, 2023 22:27
@carlossanlop
Copy link
Member

we need to add the fix for that, just pushed that commit

@buyaa-n thanks for pushing the fix. I don't see any failures related to resource manager. If there's nothing to address, can I remove the "no merge" label and merge this?

@buyaa-n
Copy link
Member

buyaa-n commented Feb 12, 2023

I don't see any failures related to resource manager. If there's nothing to address, can I remove the "no merge" label and merge this?

This is same as #81281 we are waiting for the customer sign off

@buyaa-n
Copy link
Member

buyaa-n commented Feb 13, 2023

This is approved, contingent upon partner signoff that the fix resolves their issue. Remove the NO-MERGE once we have signoff from the impacted customer.

As we did not get the partner signoff and it is the last day for 7.0.4 release code compete this fix not gonna happen in 7.0.4 release.

Though we still think the regression should be fixed in 6.0/7.0, so we will conduct some more testing especially in perf side and ask for porting in next release without customer sign off. Will not close the PR but removing the approval.

@buyaa-n buyaa-n removed the Servicing-approved Approved for servicing release label Feb 13, 2023
@carlossanlop
Copy link
Member

This is still marked as no-merge. Friendly reminder that the servicing branches open today.

@ericstj ericstj removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 9, 2023
@ericstj ericstj added the Servicing-consider Issue for next servicing release review label Mar 9, 2023
@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Mar 9, 2023
@leecow leecow modified the milestones: 7.0.x, 7.0.5 Mar 9, 2023
@carlossanlop
Copy link
Member

Approved, signed-off, CI failures in latest run look unrelated, no OOB changes needed for the involved assemblies. Ready to merge. :shipit:

@carlossanlop carlossanlop merged commit 6e6eab2 into release/7.0 Mar 10, 2023
@carlossanlop carlossanlop deleted the backport/pr-75054-to-release/7.0 branch March 10, 2023 00:13
@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Resources Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants