Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Func-eval while runtime is suspended may not run due to deadlock (timing issue) #1537

Closed
kouvel opened this issue Jan 9, 2020 · 3 comments
Closed

Comments

@kouvel
Copy link
Member

kouvel commented Jan 9, 2020

One thread may be doing:

  • TieredCompilationManager::TieringDelayTimerCallbackWorker
  • TieredCompilationManager::ResumeCountingCalls
  • MethodDesc::BackpatchToResetEntryPointSlots
  • GCCoop::GCCoop
  • And block for runtime suspension while holding the slot backpatching lock

After runtime suspension, the func-eval thread may be doing:

  • MethodDesc::DoPrestub
  • CodeVersionManager::PublishVersionableCodeIfNecessary
  • MethodDescBackpatchInfoTracker::ConditionalLockHolder::ConditionalLockHolder

It cannot acquire the lock and the func-eval times out. There doesn't seem to be any other instance of notifying the debugger of a cross-thread dependency (though something like NotifyOfCrossThreadDependency) from the VM side. It still wouldn't allow the func-eval to run anyway.

Best way to fix probably would be to not enter cooperative GC mode while holding a regular lock, such that the runtime cannot be suspended while holding the lock. This would mean entering cooperative GC mode before acquiring the lock, requiring that the lock be an unsafe any-mode lock, which in turn means that no GC_TRIGGERS work may be done inside the lock. May be doable, need to see what it would entail.

@kouvel kouvel added this to the 3.1.x milestone Jan 9, 2020
@kouvel kouvel self-assigned this Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 9, 2020
kouvel added a commit to kouvel/runtime that referenced this issue Jan 30, 2020
- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation.
- Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock.
- The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method.
- The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing

Fix for dotnet#1537
kouvel added a commit to kouvel/coreclr that referenced this issue Jan 30, 2020
- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation.
- Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock.

Risks:
- The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method.
- The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing
- Low risk for the change itself

Port of dotnet/runtime#2380
Fix for dotnet/runtime#1537
kouvel added a commit that referenced this issue Jan 30, 2020
- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation.
- Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock.
- The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method.
- The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing

Fix for #1537
kouvel added a commit to dotnet/coreclr that referenced this issue Feb 18, 2020
…28006)

- In many cases cooperative GC mode is entered after acquiring the slot backpatching lock and the thread may block for debugger suspension while holding the lock. A FuncEval may time out on entering the lock if for example it calls a virtual or interface method for the first time. Failing the FuncEval when the lock is held enables the debugger to fall back to other options for expression evaluation.
- Also added polls for debugger suspension before acquiring the slot backpatching lock on background threads that often operate in preemptive GC mode. A common case is when the debugger breaks while the tiering delay timer is active, the timer ticks shortly afterwards (after debugger suspension completes) and if a thread pool thread is already available, the background thread would block while holding the lock. The poll checks for debugger suspension and pulses the GC mode to block before acquiring the lock.

Risks:
- The fix is only a heuristic and lessens the problem when it is detected that the lock is held by some thread. Since the lock is acquired in preemptive GC mode, it is still possible that after the check at the start of a FuncEval, another thread acquires the lock and the FuncEval may time out. The polling makes it less likely for the lock to be taken by background tiering work, for example if a FuncEval starts while rejitting a method.
- The expression evaluation experience may be worse when it is detected that the lock is held, and may still happen from unfortunate timing
- Low risk for the change itself

Port of dotnet/runtime#2380
Fix for dotnet/runtime#1537
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 25, 2020
@nszeitli
Copy link

If this is the "To prevent an unsafe abort" root cause, is there a known workaround at this time?

@kouvel
Copy link
Member Author

kouvel commented Feb 25, 2020

Disabling tiered compilation should help when debugging, although it can't be disabled if attaching to a running app that already has it enabled. Tiered compilation may be disabled in the following ways:

Project file (with a clean build after changes to the property):

  <PropertyGroup>
    <TieredCompilation>false</TieredCompilation>
  </PropertyGroup>

.runtimeconfig.json

{
  "runtimeOptions": {
    "configProperties": {
      "System.Runtime.TieredCompilation": false
    }
  }
}

Environment variable:

COMPlus_TieredCompilation=0

@kouvel
Copy link
Member Author

kouvel commented Mar 4, 2020

Fixed in 3.1 with dotnet/coreclr#28006, and fixed in master with #32250 (original PR #2380)

@kouvel kouvel closed this as completed Mar 4, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants