Implement fake hot/cold splitting and corresponding stress mode #69763

amanasifkhalid · 2022-05-24T22:39:48Z

The COMPlus_JitFakeProcedureSplitting configuration flag enables testing the JIT's hot/cold splitting functionality independent of the VM. This configuration "fakes" splitting by requesting only a hot section from the VM with the following layout: hot code + 4KB buffer + cold code. Hot/Cold code pointers are manually set to their respective sections of the buffer, and the JIT continues to operate as if the VM allocated separate sections. This implementation does not currently split unwind information, and only reserves/allocates it for the hot section -- this breaks stack walks and, thus, the GC. When using this configuration, suppress the GC with a large memory threshold (ex: set COMPlus_GCgen0size=1000000).

The COMPlus_JitStressProcedureSplitting configuration flag runs a stress mode for hot/cold code splitting by always splitting a method after its first basic block. This mode exposed the following behaviors incompatible with splitting that have been corrected:

Long branches between hot and cold sections cannot be optimized to short branches. Such optimization now only occurs for branches within a section.
The decision to align a loop can be invalidated after moving part of it to the cold section. Thus, loop alignment is disabled for cold blocks.

…d-splitting

ghost · 2022-05-24T22:39:54Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

The COMPlus_JitFakeProcedureSplitting configuration flag enables testing the JIT's hot/cold splitting functionality independent of the VM. This configuration "fakes" splitting by requesting only a hot section from the VM with the following layout: hot code + 4KB buffer + cold code. Hot/Cold code pointers are manually set to their respective sections of the buffer, and the JIT continues to operate as if the VM allocated separate sections. This implementation does not currently split unwind information, and only reserves/allocates it for the hot section -- this breaks stack walks and, thus, the GC. When using this configuration, suppress the GC with a large memory threshold (ex: set COMPlus_GCgen0size=1000000).

The COMPlus_JitStressProcedureSplitting configuration flag runs a stress mode for hot/cold code splitting by always splitting a method after its first basic block. This mode exposed the following behaviors incompatible with splitting that have been corrected:

Long branches between hot and cold sections cannot be optimized to short branches. Such optimization now only occurs for branches within a section.
The decision to align a loop can be invalidated after moving blocks to a cold section. Thus, loop alignment is disabled for cold blocks.

Author:	amanasifkhalid
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `community-contribution`
Milestone:	-

JulieLeeMSFT · 2022-05-25T17:47:01Z

cc @dotnet/jit-contrib.

src/coreclr/jit/flowgraph.cpp

BruceForstall

A few comments

BruceForstall · 2022-05-25T23:55:40Z

src/coreclr/jit/compiler.cpp

+        // Loop alignment is disabled for cold blocks
+        assert((fgFirstBB->bbFlags & BBF_COLD) == 0);
+


Seems like there's no need for this assert, since we're not aligning this loop. In fact, we're specifically choosing to NOT align it, here.

BruceForstall · 2022-05-26T00:15:00Z

src/coreclr/jit/ee_il_dll.cpp

@@ -1122,9 +1122,39 @@ void Compiler::eeDispLineInfos()
 * (e.g., host AMD64, target ARM64), then VM will get confused anyway.
 */

+void Compiler::eeAllocMem(AllocMemArgs* args, UNATIVE_OFFSET hotSizeRequest, UNATIVE_OFFSET coldSizeRequest)


This function should have the same signature as allocMem: no need to pass hotSizeRequest or coldSizeRequest since they can be found from args->hotCodeSize and args->coldCodeSize.

BruceForstall · 2022-05-26T00:16:52Z

src/coreclr/jit/ee_il_dll.cpp

+{
+#ifdef DEBUG
+    // Fake splitting implementation: hot section = hot code + 4K buffer + cold code
+    const UNATIVE_OFFSET buffer = 4096;


nit: buffer is a pretty generic name. How about:

Suggested change

const UNATIVE_OFFSET buffer = 4096;

const UNATIVE_OFFSET fakeSplittingBuffer = 4096;

BruceForstall · 2022-05-26T00:19:05Z

src/coreclr/jit/ee_il_dll.cpp

+        args->hotCodeSize  = hotSizeRequest + buffer + coldSizeRequest;
+        args->coldCodeSize = 0;


Note that changing args fields works out because the caller doesn't (I think) consult these values afterwards. To be perfectly clean, you might want to copy args to a local temp before modifying its fields.

Gotcha; for brevity, I simply copied and restored hotCodeSize and coldCodeSize, so args' input members behave like they're read-only.

BruceForstall · 2022-05-26T00:24:40Z

src/coreclr/jit/ee_il_dll.cpp

+    // Fake splitting currently does not handle unwind info for cold code
+    if (isColdCode && JitConfig.JitFakeProcedureSplitting())
+    {
+        return;
+    }
+


It might be better to move this immediately before the call to reserveUnwindInfo, below, and add:

JITDUMP("reserveUnwindInfo for cold code with JitFakeProcedureSplitting enabled: ignoring cold unwind info\n");

which would add a line to the JitDump in this case, but also let the normal reserveUnwindInfo printf be executed first.

BruceForstall · 2022-05-26T00:25:24Z

src/coreclr/jit/ee_il_dll.cpp

+    // Fake splitting currently does not handle unwind info for cold code
+    if (pColdCode && JitConfig.JitFakeProcedureSplitting())
+    {
+        return;
+    }
+


Same comment here as for reserveUnwindInfo: move to just before the allocUnwindInfo call, and add a JITDUMP printout.

Note that the earlier out is only for DEBUG, is that intentional?

Yes, the JitFakeProcedureSplitting flag is defined only on Debug/Checked builds. For now, we skip unwind info for cold sections only when fake-splitting.

BruceForstall · 2022-05-26T00:28:48Z

src/coreclr/jit/jitconfigvalues.h

+                                                                             // For now, this disables unwind info for
+                                                                             // cold sections, breaking stack walks.
+                                                                             // Set COMPlus_GCgen0size=1000000 to avoid
+                                                                             // running the GC and breaking things.


Suggested change

// running the GC and breaking things.

// running the GC which requires stack walking.

BruceForstall · 2022-05-26T00:36:02Z

src/coreclr/jit/compiler.cpp

+    // JitFakeProcedureSplitting overrides JitNoProcedureSplitting with a fake splitting implementation
+    if (JitConfig.JitFakeProcedureSplitting())
+    {
+        opts.compProcedureSplitting = true;
+    }
+


I don't understand why this is necessary. If someone sets COMPlus_JitFakeProcedureSplitting=1 and COMPlus_JitNoProcedureSplitting=Main, it seems like that should work: do fake splitting on all functions except Main.

It seems like the real issue is:

opts.compProcedureSplitting = !opts.compDbgCode;

maybe should be:

if (opts.compDbgCode && !JitConfig.JitFakeProcedureSplitting()) { opts.compProcedureSplitting = false; }

I see, looks like I misunderstood what precedence JitFakeProcedureSplitting should (not) take over other configurations. I've changed the logic so JitFakeProcedureSplitting can be partially overriden by other configs, but can still override opts.compDbgCode if enabled.

src/coreclr/jit/compiler.cpp

src/coreclr/jit/emit.cpp

cshung · 2022-05-26T02:59:27Z

src/coreclr/jit/ee_il_dll.cpp

+    // Fake splitting currently does not handle unwind info for cold code
+    if (pColdCode && JitConfig.JitFakeProcedureSplitting())
+    {
+        return;
+    }
+


Note that the earlier out is only for DEBUG, is that intentional?

amanasifkhalid · 2022-05-26T14:15:06Z

The 20 failing checks all seem to build/run correctly, but fail when sending the results to Helix -- an Azure DevOps API is returning a 401 error, citing a bad System.AccessToken value.

BruceForstall · 2022-05-26T19:03:05Z

The 20 failing checks all seem to build/run correctly, but fail when sending the results to Helix -- an Azure DevOps API is returning a 401 error, citing a bad System.AccessToken value.

That was a general infrastructure problem, now fixed. You can request the testing to be rerun. (See the "Re-run" "button" on "runtime" here: https://github.com/dotnet/runtime/pull/69763/checks?check_run_id=6607345705 (rerunning the "parent" will cause rerunning all the failed "children"))

BruceForstall

LGTM

Implementation splits after first basic block in method, assuming there is more than one block. Accompanying this implementation are the following fixes: - Loop alignment is disabled for cold blocks, as moving blocks into the cold section may invalidate the initial decision to align. - Long jumps are no longer reduced to short jumps if crossing hot/cold sections.

amanasifkhalid · 2022-05-26T19:49:24Z

Thank you! I couldn't find any re-run button in the Checks UI, but force-pushing an empty commit triggered the run.

Aman Khalid added 2 commits May 17, 2022 09:46

Implemented fake code splitting in JIT for testing without VM

564c75d

Merge branch 'main' of github.com:amanasifkhalid/runtime into hot-col…

a0e2c49

…d-splitting

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label May 24, 2022

amanasifkhalid force-pushed the hot-cold-splitting branch from 74705d6 to dce5861 Compare May 25, 2022 00:10

amanasifkhalid marked this pull request as ready for review May 25, 2022 02:13

JulieLeeMSFT assigned amanasifkhalid May 25, 2022

JulieLeeMSFT requested a review from BruceForstall May 25, 2022 17:46

JulieLeeMSFT added this to the 7.0.0 milestone May 25, 2022

amanasifkhalid force-pushed the hot-cold-splitting branch from dce5861 to d748767 Compare May 25, 2022 18:08

kunalspathak reviewed May 25, 2022

View reviewed changes

src/coreclr/jit/flowgraph.cpp Outdated Show resolved Hide resolved

amanasifkhalid force-pushed the hot-cold-splitting branch from d748767 to b5021de Compare May 25, 2022 19:58

BruceForstall requested changes May 26, 2022

View reviewed changes

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 26, 2022

cshung reviewed May 26, 2022

View reviewed changes

amanasifkhalid force-pushed the hot-cold-splitting branch from b5021de to 0af0032 Compare May 26, 2022 07:10

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 26, 2022

BruceForstall approved these changes May 26, 2022

View reviewed changes

amanasifkhalid force-pushed the hot-cold-splitting branch from 0af0032 to 5965366 Compare May 26, 2022 19:47

amanasifkhalid merged commit 70fd5dc into dotnet:main May 27, 2022

amanasifkhalid mentioned this pull request May 27, 2022

Add hot/cold splitting test job to jit-runtime-experimental #69922

Merged

ghost locked as resolved and limited conversation to collaborators Jun 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement fake hot/cold splitting and corresponding stress mode #69763

Implement fake hot/cold splitting and corresponding stress mode #69763

amanasifkhalid commented May 24, 2022 •

edited

Loading

ghost commented May 24, 2022

JulieLeeMSFT commented May 25, 2022

BruceForstall left a comment

BruceForstall May 25, 2022

BruceForstall May 26, 2022

BruceForstall May 26, 2022

BruceForstall May 26, 2022

amanasifkhalid May 26, 2022

BruceForstall May 26, 2022

BruceForstall May 26, 2022

cshung May 26, 2022

amanasifkhalid May 26, 2022

BruceForstall May 26, 2022

BruceForstall May 26, 2022

amanasifkhalid May 26, 2022

cshung May 26, 2022

amanasifkhalid commented May 26, 2022

BruceForstall commented May 26, 2022

BruceForstall left a comment

amanasifkhalid commented May 26, 2022

		// Loop alignment is disabled for cold blocks
		assert((fgFirstBB->bbFlags & BBF_COLD) == 0);

	const UNATIVE_OFFSET buffer = 4096;
	const UNATIVE_OFFSET fakeSplittingBuffer = 4096;

		args->hotCodeSize = hotSizeRequest + buffer + coldSizeRequest;
		args->coldCodeSize = 0;

	// running the GC and breaking things.
	// running the GC which requires stack walking.

Implement fake hot/cold splitting and corresponding stress mode #69763

Implement fake hot/cold splitting and corresponding stress mode #69763

Conversation

amanasifkhalid commented May 24, 2022 • edited Loading

ghost commented May 24, 2022

JulieLeeMSFT commented May 25, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid commented May 26, 2022

BruceForstall commented May 26, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

amanasifkhalid commented May 26, 2022

amanasifkhalid commented May 24, 2022 •

edited

Loading