Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: consider running some optimizations in Tier0 #9120

Open
Tracked by #76969
AndyAyersMS opened this issue Oct 13, 2017 · 11 comments
Open
Tracked by #76969

JIT: consider running some optimizations in Tier0 #9120

AndyAyersMS opened this issue Oct 13, 2017 · 11 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitThroughput CLR JIT issues regarding speed of JIT itself
Milestone

Comments

@AndyAyersMS
Copy link
Member

The goal of Tier0 is to jit code as quickly as possible. Currently this is done by having Tier0 enable the "minopts" mode in the jit, which disables all optimization. But there are good reasons to believe that running some optimizations in Tier0 can improve the speed of jitting.

To first order the time it takes to jit a method (especially when lightly optimizing) is proportional to the amount of code the jit produces for the method. So any cheap optimization that reduces the overall size of the generated code is a candidate for running in Tier0. Some ideas:

  • enable the importer branch folding
  • enable the early type opts that feed importer branch folding. Perhaps doubly relevant since R2R prejitting will leave generic code to the jit, so tier0 will see more generic instantations then "normal", and these are the method bodies that often can be greatly simplified by early type opts. These opts are also pretty cheap.
  • other kinds of simple expression tree simplifications -- maybe some parts of morph are cheap enough to enable?
  • avoid inline expansion of helpers. Tricky because while this helps jit time it slows down the jitted code -- but we hope not to run the Tier0 code very often, so it seems like it could pay off.
  • perhaps inline very small methods. I have some older and perhaps flawed data that indicates this should be TP win, but recent results haven't borne this out. But it is worth revisiting. Certainly, inlining small methods often reduces code size, and it should cut down on the number of jit invocations.

There are notes about this scattered about in other issues, Will track these down and link them back here.

category:throughput
theme:optimization
skill-level:expert
cost:medium

@AndyAyersMS
Copy link
Member Author

Also:

@AndyAyersMS
Copy link
Member Author

Considering for 2.1.

@AndyAyersMS
Copy link
Member Author

Did some prototyping of this a while back and wasn't really able to get a clear picture of potential improvements. So am going to hold off on this until after 2.1.

@tannergooding
Copy link
Member

@AndyAyersMS, is removing dead code for constant branches currently part of this?

There are a few places today where we return gtIconNode (such as Hardware Intrinsics, SIMD Intrinsics, etc) that look like they are clear/easy wins for minopts.

@AndyAyersMS
Copy link
Member Author

Yeah that's the "importer branch folding" bit mentioned above.

I agree there are be wins to be had here. At the time I ran these investigations, I was not seeing consistent wins on the scenarios I was able to measure. But it was early and the measurements were somewhat ad-hoc. I believe @noahfalk is working on a more comprehensive set of well defined scenarios to use in evaluations. Would be nice to have something in there that leverages intrinsics.

So I'll revisit this, but likely not until after 2.1 is more or less done.

@AndyAyersMS
Copy link
Member Author

See dotnet/coreclr#22984 for some notes on a prototype: EnableBoxingOptsTier0.

@AndyAyersMS
Copy link
Member Author

We should also consider enabling intrinsic expansion, for both HW intrinsics and for intrinsics that are likely to lead to early control flow trimming.

@AndyAyersMS
Copy link
Member Author

Note large compiled regexes are currently jitted with minopts.

They would benefit somewhat from intrinsic expansion for String.get_Chars.

If we add optimizations to Tier0 then we might consider compiling these (or any other too large to optimize method) at Tier0 instead of minopts.

@tannergooding
Copy link
Member

Is that still going to apply for regexes pre-compiled using source generators?

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Apr 24, 2020

Pre-compiled as in compiled to IL, or pre-compiled as in crosssgen?

Pre-compiled-to IL and then jitted regex methods will have similar codegen to what you get now, with the excepion that since they are no longer dynamic methods they'll be eligible for tiering and all that entails (possibly faster startup/initial jitting, slower matching for initial iterations, then faster after rejitting -- that is, if they are not too big to optimize).

If you prejit your precompiled regexes, you may get better codegen, I think jit has higher circuit-breaker limits prejitting, but it still does have limits. I was planning to document those (see #31942) but haven't done so yet. But given R2R overhead and the fact that calls from the regex code to the regex library won't be inlined, it is hard to say for sure where you might end up.

Prejitting large methods may lead to perf anomalies where the R2R code is optimized, then tiering decides to replace it, and then the jit decides not to optimize since it has tighter constraints on jit time than prejit time. This is something we need to watch closely, presumably we'd be better of just keeping the prejitted code for such cases.

@tannergooding
Copy link
Member

One of the proposals for source generators is to use them to convert a regex into C# code, which would then be naturally compiled down to IL.

But it sounds like it would still be applicable there as well :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitThroughput CLR JIT issues regarding speed of JIT itself
Projects
None yet
Development

No branches or pull requests

4 participants