Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hot/cold splitting test job to jit-runtime-experimental #69922

Merged
merged 1 commit into from
Jun 7, 2022

Conversation

amanasifkhalid
Copy link
Member

Adding a new rolling test job to JIT-runtime-experimental can help ensure the new fake- and stress-splitting modes in the JIT do not regress. The proposed test, titled jitosr_stress_splitting, runs a Checked build of the JIT on OS- and platform-specific test suites with the following environment:

  • COMPlus_GCgen0size=1000000
  • COMPlus_JitFakeProcedureSplitting=1
  • COMPlus_JitStressprocedureSplitting=1

This environment forces the JIT to fake-split every method after its first basic block, assuming the method consists of more than one block. COMPlus_GCgen0size=1000000 sets a large memory allocation threshold before the GC runs; recall that COMPlus_JitFakeProcedureSplitting currently doesn't generate unwind information for cold code, thus breaking the GC's stack walks. COMPlus_GCgen0size provides a patchwork fix for this limitation in cases where memory usage is not excessive (i.e. >= 0x1000000 B).

In my local testing, all of the x64 tests in src/tests/ pass with this environment.

@ghost ghost assigned amanasifkhalid May 27, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 27, 2022
@ghost
Copy link

ghost commented May 27, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Adding a new rolling test job to JIT-runtime-experimental can help ensure the new fake- and stress-splitting modes in the JIT do not regress. The proposed test, titled jitosr_stress_splitting, runs a Checked build of the JIT on OS- and platform-specific test suites with the following environment:

  • COMPlus_GCgen0size=1000000
  • COMPlus_JitFakeProcedureSplitting=1
  • COMPlus_JitStressprocedureSplitting=1

This environment forces the JIT to fake-split every method after its first basic block, assuming the method consists of more than one block. COMPlus_GCgen0size=1000000 sets a large memory allocation threshold before the GC runs; recall that COMPlus_JitFakeProcedureSplitting currently doesn't generate unwind information for cold code, thus breaking the GC's stack walks. COMPlus_GCgen0size provides a patchwork fix for this limitation in cases where memory usage is not excessive (i.e. >= 0x1000000 B).

In my local testing, all of the x64 tests in src/tests/ pass with this environment.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member

Looks good. However, this isn't related to OSR, so remove "osr" from the name. Use jit_stress_splitting or jit_stress_procedure_splitting.

@amanasifkhalid amanasifkhalid marked this pull request as ready for review May 27, 2022 21:45
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

You should trigger jit-runtime-experimental on this PR to verify.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone May 27, 2022
@amanasifkhalid
Copy link
Member Author

I ran and re-ran runtime-jit-experimental and, like all the other failing checks, it fails to send tests to Helix due to a bad System.AccessToken value. Is the fact that runtime-jit-experimental makes it all the way to the "Send to Helix" step sufficient verification to merge the PR?

@BruceForstall
Copy link
Member

I ran and re-ran runtime-jit-experimental and, like all the other failing checks, it fails to send tests to Helix due to a bad System.AccessToken value.

You hit #69854, which I believe is now fixed. Try triggering the job again.

Is the fact that runtime-jit-experimental makes it all the way to the "Send to Helix" step sufficient verification to merge the PR?

No -- It's the "Send to Helix" step that actually runs tests, and since it failed, no tests were run.

@BruceForstall
Copy link
Member

You also hit the "remainingSize == 4" assert which was fixed with #69905

Also, some wasm failures which I don't see issues for, but which are obviously unrelated. You can always try "Re-run" on those.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

I pulled from main and re-ran runtime-jit-experimental, but it's still failing with 401 responses from Azure DevOps, per #69854 .

@BruceForstall
Copy link
Member

I pulled from main and re-ran runtime-jit-experimental, but it's still failing with 401 responses from Azure DevOps, per #69854 .

The scheduled run on Sunday did not have those issues (it had other issues, but not those: https://dev.azure.com/dnceng/public/_build/results?buildId=1795838&view=ms.vss-test-web.build-test-results-tab).

My suggestion:

  1. rebase on top of main (don't merge main)
  2. push the updated PR

It shouldn't be necessary, but if that doesn't work, try closing and re-opening the PR, or create an entirely new PR.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Fake-splitting currently breaks stack walks by not generating unwind
info for cold code. This commit implements unwind info on x86/64 when
fake-splitting by generating unwind info for the combined hot/cold
section just once, rather than generating unwind info for the separate
sections.

For reasons to be investigated, this implementation does not work when
the code sections are separated by an arbitrary buffer (such as the 4KB
buffer previously used). Thus, the buffer has been removed from the
fake-splitting implementation: Now, the hot and cold sections are
placed contiguously in memory, but the JIT continues to behave as if
they are arbitrarily far away (for example, by using long branches
between sections).

Following this fix, fake-splitting no longer requires the GC to be
suppressed by setting `COMPlus_GCgen0size=1000000`. A test job
has been added to `runtime-jit-experimental` to ensure
fake/stress-splitting does not regress.
@amanasifkhalid
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

I have pushed a commit with a fix for unwind info generation when fake-splitting: When calling eeAllocUnwindInfo, we manually set the start offset to point at the beginning of the hot section, and the end offset to point at the end of the cold section, effectively treating them as a single "hot" section. However, for reasons under investigation, this implementation breaks the GC when there is a buffer between the sections; thus, I've adjusted the fake-splitting implementation to leave the hot/cold sections adjacent in memory, while still behaving as if they're arbitrarily far apart (for example, the JIT still generates long branches between sections).

I've tested the x64 implementation locally, and will adjust x86 implementation/add ARM implementation as needed.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If the tests pass, you're good to merge.

@amanasifkhalid
Copy link
Member Author

Other than jit_osr_stress_random (see issue), runtime-jit-experimental is passing. Build windows arm64 Release NativeAOT is failing due to a Helix job timing out, but the build otherwise succeeds.

@amanasifkhalid amanasifkhalid merged commit 08b3170 into dotnet:main Jun 7, 2022
@amanasifkhalid amanasifkhalid deleted the jit-pipeline-tests branch June 7, 2022 17:47
@AndyAyersMS AndyAyersMS mentioned this pull request Jun 29, 2022
54 tasks
@ghost ghost locked as resolved and limited conversation to collaborators Jul 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants