Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make mk_attr_id part of ParseSess #101313

Merged
merged 2 commits into from
Sep 14, 2022
Merged

make mk_attr_id part of ParseSess #101313

merged 2 commits into from
Sep 14, 2022

Conversation

SparrowLii
Copy link
Member

Updates #48685

The current mk_attr_id uses the AtomicU32 type, which is not very efficient and adds a lot of lock contention in a parallel environment.

This PR refers to the task list in #48685, uses mk_attr_id as a method of the AttrIdGenerator struct, and adds a new field attr_id_generator to ParseSess.

AttrIdGenerator uses the WorkerLocal, which has two advantages: 1. Cell is more efficient than AtomicU32, and does not increase any lock contention. 2. We put the index of the work thread in the first few bits of the generated AttrId, so that the AttrId generated in different threads can be easily guaranteed to be unique.

cc @cjgillot

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 2, 2022
@rust-highfive
Copy link
Collaborator

r? @fee1-dead

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 2, 2022
@cjgillot cjgillot self-assigned this Sep 2, 2022
// starting value of AttrId in each worker thread.
// The `index` is the index of the worker thread.
// This ensures that the AttrId generated in each thread is unique.
AttrIdGenerator(WorkerLocal::new(|index| Cell::new((index as u32).reverse_bits())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with parallel compiler code, but is there a cap as to how many threads can be used for parallel rustc? If there are too many then the actual usable bits would be decreased.

Copy link
Member Author

@SparrowLii SparrowLii Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a cap. AFAIK, the number of threads is usually no more than 64. Besides, as the number of threads increases, the number of AttrIds that need to be assigned per thread will decreases relatively, so I don't think this is a problem.

@cjgillot
Copy link
Contributor

cjgillot commented Sep 5, 2022

I understand that the main motivation is performance in parallel environment. Do you have a measurement of the perf improvement?

@SparrowLii
Copy link
Member Author

I understand that the main motivation is performance in parallel environment. Do you have a measurement of the perf improvement?

I haven't learned about a good benchmark for measuring the efficiency of parallel compilation, which is also in my follow-up implementation plan. I guess for now we just need to guarantee that there is no negative impact on the efficiency in non-parallel mode. So I think we can run rustc perf directly.

@SparrowLii
Copy link
Member Author

SparrowLii commented Sep 5, 2022

There are already some issues about parallel compilation' benchmarks, such as #59667 and #92596, I think we will solve them in subsequent implementations.

@cjgillot
Copy link
Contributor

cjgillot commented Sep 5, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 5, 2022
@bors
Copy link
Contributor

bors commented Sep 5, 2022

⌛ Trying commit f9234777e44bde1b27786e9f4300b0ee3323c5f9 with merge 7e04d9038009c47e6b24a62aab7fa9d31c71706a...

@bors
Copy link
Contributor

bors commented Sep 5, 2022

☀️ Try build successful - checks-actions
Build commit: 7e04d9038009c47e6b24a62aab7fa9d31c71706a (7e04d9038009c47e6b24a62aab7fa9d31c71706a)

@rust-timer
Copy link
Collaborator

Queued 7e04d9038009c47e6b24a62aab7fa9d31c71706a with parent 5b4bd15, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7e04d9038009c47e6b24a62aab7fa9d31c71706a): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-1.2%, -0.3%] 13
Improvements ✅
(secondary)
-1.2% [-1.7%, -0.8%] 8
All ❌✅ (primary) -0.7% [-1.2%, -0.3%] 13

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
1.6% [0.8%, 2.3%] 2
Regressions ❌
(secondary)
2.3% [1.4%, 3.7%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) 1.6% [0.8%, 2.3%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-2.8%, -2.8%] 1
All ❌✅ (primary) - - 0

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 5, 2022
@cjgillot
Copy link
Contributor

From the perf report, this PR looks like a great idea to improve perf.
However, I'm a bit afraid of Heisenbugs in parallel-compiler due to silent collisions of AttrId because they overflow the allocated 27 bits or so.
Could you add a debug-assertion that this does not happen?
Then r=me

@SparrowLii
Copy link
Member Author

Sure, I added the corresponding modifications.

@SparrowLii
Copy link
Member Author

@bors r=cjgillot

@bors
Copy link
Contributor

bors commented Sep 13, 2022

@SparrowLii: 🔑 Insufficient privileges: Not in reviewers

@SparrowLii
Copy link
Member Author

@cjgillot It looks like I don't have the privilege to r=

@bors
Copy link
Contributor

bors commented Sep 13, 2022

☔ The latest upstream changes (presumably #101757) made this pull request unmergeable. Please resolve the merge conflicts.

@cjgillot
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Sep 14, 2022

📌 Commit bfc4f2e has been approved by cjgillot

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 14, 2022
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Sep 14, 2022
@bors
Copy link
Contributor

bors commented Sep 14, 2022

⌛ Testing commit bfc4f2e with merge 750bd1a...

@bors
Copy link
Contributor

bors commented Sep 14, 2022

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing 750bd1a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 14, 2022
@bors bors merged commit 750bd1a into rust-lang:master Sep 14, 2022
@rustbot rustbot added this to the 1.65.0 milestone Sep 14, 2022
@bors bors mentioned this pull request Sep 15, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (750bd1a): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.1% [2.1%, 2.1%] 1
Improvements ✅
(primary)
-4.5% [-4.9%, -4.1%] 2
Improvements ✅
(secondary)
-3.2% [-3.3%, -3.1%] 2
All ❌✅ (primary) -4.5% [-4.9%, -4.1%] 2

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

@Zoxc
Copy link
Contributor

Zoxc commented Jan 27, 2023

It's possible that WorkerLocal is slower than fetch_add since the latter is quite fast already. I don't think mk_attr_id is hot enough for it to matter, but to actually measure the overhead I'd recommend measuring check builds using a single thread on a CPU with its frequency locked using my benchmark tool.

@SparrowLii
Copy link
Member Author

SparrowLii commented Jan 28, 2023

Thanks, your suggestion is very valuable! In fact, we don't have good tools for measuring the performance of compilers in parallel environments at current. I will try this tool then!

@Zoxc
Copy link
Contributor

Zoxc commented Feb 7, 2023

This happened to break the parallel compiler due to WorkerLocal being created outside the Rayon thread pool.

@SparrowLii
Copy link
Member Author

SparrowLii commented Feb 8, 2023

You mean WorlerLocal can not get the index correctly? If so then WorkerLocal really shouldn't be used. It might be reasonable to use thread_local.

Or we should add this WorkerLocal to TyCtxt so it doesn't exceed Rayon thread pool

@Zoxc
Copy link
Contributor

Zoxc commented Feb 8, 2023

I think it just ends up spawning the global Rayon thread pool. I kind of have a workaround in #107782, but it's not particularly clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants