Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[3.1] Fail FuncEval if slot backpatching lock is held by any thread #28006

Merged
merged 1 commit into from
Feb 18, 2020

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented Jan 30, 2020

  • In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation.
  • Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case in master is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. This is less common in 3.1 because the callback pulses the GC mode at the beginning, but still may occur occasionally. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock.

Customer impact

Regression?

Yes, regression from 2.x, starting in 3.0

Testing

  • It's a timing issue and was reproduced by inducing specific timings in various phases of tiering
  • Verified that the timeout does not occur after the fix in the vast majority of cases. Instead, the FuncEval is failed when the lock is held and VS falls back to alternate means for evaluating the expression. See more in risks below.
  • Checked debugger-broken stacks at various phases of tiering to verify that the poll for debugger suspension is working as expected
  • Standard tests

Risks

  • The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method.
  • The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing
  • Low risk for the change itself

Port of dotnet/runtime#2380
Fix for dotnet/runtime#1537

- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation.
- Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock.

Risks:
- The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method.
- The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing
- Low risk for the change itself

Port of dotnet/runtime#2380
Fix for dotnet/runtime#1537
@kouvel kouvel added this to the 3.1.x milestone Jan 30, 2020
@kouvel kouvel self-assigned this Jan 30, 2020
@kouvel
Copy link
Member Author

kouvel commented Feb 10, 2020

@noahfalk could you please take a look? This is a 3.1 port of the similar change to master: dotnet/runtime#2380

@davidwrighton
Copy link
Member

@jkotas reverted this change in master. Please investigate why.

@jkotas
Copy link
Member

jkotas commented Feb 10, 2020

The change had conflicts in master with the other big tiered compilations changes that had to be reverted to fix the stability problems. I was not sure whether these two changes are independent, so I have reverted both.

@kouvel
Copy link
Member Author

kouvel commented Feb 11, 2020

The change in master would be slightly different before and after the reverted TC change, I'll probably make a separate PR in master after investigating and fixing those issues. The equivalent of this change in master wasn't the cause of the issues, and this port still applies in 3.1.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @kouvel, this did not cause the issues in master, and believe it will address the funceval problems in 3.1

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I will seek tactics approval for March

@wtgodbe
Copy link
Member

wtgodbe commented Feb 18, 2020

@jeffschwMSFT @kouvel is this for 3.1.3 (march)? If so please merge by EOD today if it gets approval

@wtgodbe wtgodbe added the Servicing-consider Issue for next servicing release review label Feb 18, 2020
@jeffschwMSFT jeffschwMSFT added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Feb 18, 2020
@jeffschwMSFT jeffschwMSFT modified the milestones: 3.1.x, 3.1.3 Feb 18, 2020
@jeffschwMSFT
Copy link
Member

Tactics approved for March 3.1.3

@kouvel
Copy link
Member Author

kouvel commented Feb 18, 2020

Thanks all!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-TieredCompilation Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants