Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve call counting mechanism #32250

Merged
merged 3 commits into from
Mar 3, 2020
Merged

Improve call counting mechanism #32250

merged 3 commits into from
Mar 3, 2020

Conversation

kouvel
Copy link
Member

@kouvel kouvel commented Feb 13, 2020

  • Commit 1
  • Commit 2
    • Fixes crashes and assertion failures seen by the original change, fixes Crashes caused by "Improve call counting mechanism" change #29934
    • The crashes were caused by commit 6aa3c70 in the original PR
    • Call counting infos cannot be deleted when the corresponding call counting stubs may still run, because:
      • The remaining call count decremented by the stub is in the call counting info
      • The only way to get a code version / method desc from a stub is to go through the call counting info
    • Got one repro of the assertion failure in leakwheel GC test triggered assert #22786 and JIT.Methodical failing on coreclr outerloop #24664 (fixes JIT.Methodical failing on coreclr outerloop #24664) and it is most likely caused by the same issue, following heap corruption from modifying a deleted call counting info where the memory is reused for an object used by code versioning, messing up some data and making it look like the code version is not active when it actually is according to the dump
    • Fixed with a partial revert of the above commit. Added back the Complete stage and then call counting infos are deleted only after it's ensured that call counting stubs won't be used (shortly before deleting them).
  • Commit 3
    • Public static functions of CallCountingManager that may be called through the debugger may occur before static initialization, added a check for null as suggested in Fix createdump DAC segfault #29892

@kouvel kouvel added this to the 5.0 milestone Feb 13, 2020
@kouvel kouvel self-assigned this Feb 13, 2020
@kouvel
Copy link
Member Author

kouvel commented Feb 13, 2020

Got a consistent repro of the crash after a few hours of running a CoreFX test suite, verified no repro after fix after 12+ hours. Got one repro of the assertion failure, currently running again after fix, will continue running overnight.

@kouvel kouvel force-pushed the CallCounting branch 2 times, most recently from fdd45b5 to 40f1a7f Compare February 14, 2020 17:12
@kouvel
Copy link
Member Author

kouvel commented Feb 14, 2020

No assertion failure after 12+ hours after fix. Rebased to fix conflicts.

@kouvel kouvel closed this Feb 18, 2020
@kouvel kouvel reopened this Feb 18, 2020
Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kouvel
Copy link
Member Author

kouvel commented Feb 28, 2020

Thanks @noahfalk! Rebased to latest to run through the checks again.

@kouvel
Copy link
Member Author

kouvel commented Feb 28, 2020

Failures are same as #32951

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

- Commit 1
  - Reverts commit f954c6b, which reverted PR dotnet#1457 due to issues
- Commit 2
  - Fixes crashes and assertion failures seen by the original change, fixes dotnet#29934
  - The crashes were caused by commit dotnet@6aa3c70 in the original PR
  - Call counting infos cannot be deleted when the corresponding call counting stubs may still run, because:
    - The remaining call count decremented by the stub is in the call counting info
    - The only way to get a code version / method desc from a stub is to go through the call counting info
  - Got one repro of the assertion failure in dotnet#22786 and it is most likely caused by the same issue, following heap corruption from modifying a deleted call counting info where the memory is reused for a `NativeCodeVersionNode`, messing up the method desc pointer
  - Fixed with a partial revert of the above commit. Added back the `Complete` stage and then call counting infos are deleted only after it's ensured that call counting stubs won't be used (shortly before deleting them).
- Commit 3
  - Public static functions of `CallCountingManager` that may be called through the debugger may occur before static initialization, added a check for null as suggested in dotnet#29892
@kouvel
Copy link
Member Author

kouvel commented Mar 2, 2020

Rebased to fix conflict

@kouvel kouvel merged commit f30ea37 into dotnet:master Mar 3, 2020
@kouvel kouvel deleted the CallCounting branch March 3, 2020 15:55
gbalykov added a commit to gbalykov/runtime that referenced this pull request Apr 4, 2020
jkotas pushed a commit that referenced this pull request Apr 6, 2020
* Fix Linux x86 build

Related to #33005

* Fix Linux x86 build

Related to #33653, #33005

* Fix Linux x86 build

Related to #32250
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashes caused by "Improve call counting mechanism" change JIT.Methodical failing on coreclr outerloop
3 participants