Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove AggressiveOpt from CastHelpers.LdelemaRef/StelemRef #90412

Merged
merged 10 commits into from
Aug 16, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 11, 2023

These two methods show up even for an empty app in JitDisasmSummary. The idea that we can treat them as normal managed methods and convert to direct calls once they reach Tier1 naturally (or FullOpts with TC=0).

Almost the final step (except the ETW stuff) to make empty apps jitting-free 🙂 (well, except the Main)

@stephentoub
Copy link
Member

stephentoub commented Aug 11, 2023

And #90416 clears up the ETW stuff, if it's a valid fix (but it seems too easy 😄 ... we'll see what CI and others have to say)

@jkotas jkotas requested review from kouvel and VSadov August 12, 2023 13:29
@EgorBo EgorBo marked this pull request as ready for review August 12, 2023 17:22
src/coreclr/vm/jitinterface.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/jitinterface.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/jitinterface.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/tieredcompilation.cpp Outdated Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

This seems to improve benchmarks, e.g.:

object[] _array = new object[1000];
object _obj = "test";

[Benchmark]
public void Test1()
{
    // stelem.ref
    _array[1] = _obj;
    _array[2] = _obj;
}

[Benchmark]
public bool Test2()
{
    // CastHelpers.IsInstanceOf*
    return _obj is IDisposable
                or ICloneable
                or Program;
}
Method Job Toolchain Mean
Test1 Job-PCYDPA \runtime-main\corerun.exe 3.531 ns
Test1 Job-ZCRCGK \runtime\corerun.exe 2.815 ns
Test2 Job-PCYDPA \runtime-main\corerun.exe 3.744 ns
Test2 Job-ZCRCGK \runtime\corerun.exe 3.525 ns

I don't know why Test1 is faster but it seems like a stable improvement between runs, the codegen is the same: https://www.diffchecker.com/RT66eY3O/ (stlem is a direct call as expected) - perhaps the method itself got allocated in a different part of the execution heap since it's no longer Aggressive opt? Or maybe the previous direct target wasn't entirely correct and pointed to a stub directly?

Test2 is slightly faster because we now optimize indirect calls to direct ones: https://www.diffchecker.com/g5yrentL/ (the last call wasn't optimized to a direct one because it never reached the tier1 in this benchmark - it was never taken).

Hopefully, the difference is even better on arm64 (don't have any device around currently to test)

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

/azp list

@azure-pipelines

This comment was marked as resolved.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

@kouvel this is ready for the 2nd round 🙂 all PGO pipelines (with TC=1) look green. Benchmarks: #90412 (comment)

@kouvel
Copy link
Member

kouvel commented Aug 16, 2023

I don't know why Test1 is faster but it seems like a stable improvement between runs, the codegen is the same

Possibly because AggressiveOptimization was removed and now it's using tier 1 code. There are code gen differences before and after this change in the helper method.

(stlem is a direct call as expected) - perhaps the method itself got allocated in a different part of the execution heap since it's no longer Aggressive opt? Or maybe the previous direct target wasn't entirely correct and pointed to a stub directly?

Given that it was doing a direct call previously also, I wonder if there's a possibility that it could do a direct call to any tier including tier 0 (which would probably be r2r'd code, which may be ok perhaps) or an instrumented tier. Wonder if this should also be specialized for the other tiers to have it do an indirect call instead.

Copy link
Member

@kouvel kouvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment/thought, LGTM though, thanks!

@EgorBo
Copy link
Member Author

EgorBo commented Aug 16, 2023

Given that it was doing a direct call previously also, I wonder if there's a possibility that it could do a direct call to any tier including tier 0 (which would probably be r2r'd code, which may be ok perhaps) or an instrumented tier. Wonder if this should also be specialized for the other tiers to have it do an indirect call instead.

Like you mentioned, it had AggressiveOptimization attribute on it so presumably it was fine? Going to merge this one into 9.0 now, will see soon perf impact on microbenchmarks (my local runs show improvemnets)

The outerloop failure is #90593

@EgorBo EgorBo merged commit 42862cc into dotnet:main Aug 16, 2023
149 of 151 checks passed
@EgorBo EgorBo deleted the remove-aggressiveopt-casthelpers branch August 16, 2023 00:55
@EgorBo
Copy link
Member Author

EgorBo commented Aug 16, 2023

With this change, empty console app (without PublishReadyToRun) on Linux jits just one method now! Main itself:
image

(this PR removed these two cast helpers from this list + several PRs recently got rid of Vector<> paths in some BCL functions and made them prejittable + this is Linux where ETW doesn't trigger 🙂)

@kouvel
Copy link
Member

kouvel commented Aug 16, 2023

Like you mentioned, it had AggressiveOptimization attribute on it so presumably it was fine?

Yea I meant after this change, there would be R2R'ed code after the first call (AggressiveOptimization prevents R2R code to be generated), an intermediate code version with profiling instrumentation with TieredPGO, and finally the tier 1 code. The precode target I imagine would point to those things along the way and returning the precode's target in the fallback path may mean that there is a possibility for there to be direct calls to non-final tiers in some cases. It is probably very rare though. Solving it would probably require taking the lock every time to check the tier.

Also similarly for the other cast helpers that don't have AggressiveOptimization.

@kouvel
Copy link
Member

kouvel commented Aug 16, 2023

Solving it would probably require taking the lock every time to check the tier.

Although there may be a shortcut to avoid the lock in startup cases where the helper method hasn't yet been called (maybe the precode target is not set to a native entry point).

@jkotas
Copy link
Member

jkotas commented Aug 16, 2023

returning the precode's target in the fallback path may mean that there is a possibility for there to be direct calls to non-final tiers in some cases

I do not think there is any code that would short-circuit the precode. If the JIT gets a pointer to precode, it is going to call it.

@kouvel
Copy link
Member

kouvel commented Aug 16, 2023

I do not think there is any code that would short-circuit the precode. If the JIT gets a pointer to precode, it is going to call it.

Ah ok, I misread the code, the fallback path would always be an indirect call and that would be fine. Thanks!

@EgorBo
Copy link
Member Author

EgorBo commented Aug 17, 2023

Arm64 codegen diff:

object[] _array = new object[1000];
object _obj = "test";

[Benchmark]
public void Test1()
{
    _array[1] = _obj;
}

[Benchmark]
public bool Test2()
{
    return _obj is IDisposable;
}

https://www.diffchecker.com/WhfRiUit/

@ghost ghost locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants