Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of atomic load in toplevel code #47578

Merged
merged 3 commits into from
Nov 17, 2022

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Nov 15, 2022

As noted in #47561, the performance of toplevel code is pretty bad because of the atomic barrier (loading the global world age into the task-local one) that's emitted after every instruction now. @vtjnash noted that monotonic ordering is sufficient, which recovers some performance (1.5s -> 0.7s). But the biggest improvement comes from not emitting the atomic operation between every statement, but only when we perform a call.

Fixes #47561

@maleadt maleadt added the performance Must go faster label Nov 15, 2022
@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 15, 2022

It was somewhat intended. I wanted to move it before every CallInstr, instead of doing it after the calls or ccalls, so that it was synchronized with any other atomic barriers written by the user

@maleadt
Copy link
Member Author

maleadt commented Nov 16, 2022

We could also move these costly atomic operations to only appear before CallInst or similar such instructions, where the cost is free

I implemented that suggestion, @vtjnash.

@maleadt
Copy link
Member Author

maleadt commented Nov 17, 2022

Is the linux32 GC corruption known? I don't see how it could be related...

As this is a simple performance fix for an issue that could otherwise trip up people benchmarking code in the REPL (which seems like a common thing to do), I think it would be good to back-port this to 1.8
(if we're still going to do a release before 1.9). Feel free to remove the tag if anybody disagrees.

@maleadt maleadt added backport 1.8 Change should be backported to release-1.8 backport 1.9 Change should be backported to release-1.9 labels Nov 17, 2022
@gbaraldi
Copy link
Member

It might be an OOM error showing up as something else. Does retrying fix it?

@maleadt
Copy link
Member Author

maleadt commented Nov 17, 2022

No, it seems to happen every time...

@maleadt
Copy link
Member Author

maleadt commented Nov 17, 2022

After the rebase, CI looks better.

@maleadt maleadt merged commit 526cbf7 into master Nov 17, 2022
@maleadt maleadt deleted the tb/global_monotonic_barrier branch November 17, 2022 22:07
@vchuravy
Copy link
Sponsor Member

The 32bit windows failure is still there, as well as on other PRs rebased ontop of this (as an example https://github.com/JuliaLang/julia/commits/vc/vtune)

vchuravy added a commit that referenced this pull request Nov 19, 2022
maleadt pushed a commit that referenced this pull request Nov 19, 2022
@maleadt maleadt removed backport 1.8 Change should be backported to release-1.8 backport 1.9 Change should be backported to release-1.9 labels Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spurious performance regression in Julia 1.8 vs 1.7 for @time in top-level-scope
4 participants