Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x.py doesn't limit job concurrency effectively #81957

Closed
tgnottingham opened this issue Feb 10, 2021 · 7 comments
Closed

x.py doesn't limit job concurrency effectively #81957

tgnottingham opened this issue Feb 10, 2021 · 7 comments
Labels
C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@tgnottingham
Copy link
Contributor

tgnottingham commented Feb 10, 2021

tl;dr -- I believe Cargo -jN limits you to N concurrent codegen workers, while x.py -jN limits you to N * number of CPUs concurrent codegen workers, which is rough for memory usage during bootstrap. E.g. on an 8 CPU system, -j4 would not limit you to 4, but to 32 concurrent codegen workers.


With Cargo, my understanding is that -j4 limits you not only to compiling at most four crates at once, but that it limits you to running at most four of certain types of jobs within rustc at once, across all rustc instances under that Cargo invocation (maybe across multiple concurrent Cargo invocations?). This relates to jobs whose concurrency is dictated by the jobserver mechanism, like codegen.

(Actually, I'm not sure that's 100% correct. It's easy to see that -j limits the max number of codegen workers active within an individual rustc instance, but I don't know how to verify if it limits them across all instances. The number of rustc threads in general isn't limited by this from what I've seen.)

But with x.py, -j4 only seems to limit you to compiling at most four crates at once. I can see that each rustc instance is still able to have up to 8 codegen workers simultaneously, including the main thread (probably 8 because my system has 8 CPUs). But I'm not sure if all of the rustc instances are globally limited to 8 concurrent jobs, or if they're individually limited to 8, so that the global limit is effectively 32 concurrent jobs.

I expect that changing this to limit concurrency the way Cargo does would significantly reduce memory usage during bootstrap. It would also reduce the amount of concurrency, unless you increase the -j setting, but it's possible that the extra concurrency was overkill.

@jonas-schievink jonas-schievink added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) C-bug Category: This is a bug. labels Feb 10, 2021
@Mark-Simulacrum
Copy link
Member

I'm not sure if there's anything we can do - to my knowledge, x.py isn't doing anything interesting here, we just pass -j down to Cargo.

If you can provide instructions to observe this or detect it somehow, then there may be more that we can do. It's worth noting that I expect each rustc to start roughly the same number of threads as codegen units, it's just that they should all immediately stop waiting on a job server token. I might not be remembering this piece right though.

@the8472
Copy link
Member

the8472 commented Feb 10, 2021

If you can provide instructions to observe this or detect it somehow, then there may be more that we can do. It's worth noting that I expect each rustc to start roughly the same number of threads as codegen units, it's just that they should all immediately stop waiting on a job server token. I might not be remembering this piece right though.

According to this related zulip discussion rustc could perhaps use a threadpool that only ramps up on demand. Then if compilers are starved for job tokens they won't need to start additional threads.

I expect that changing this to limit concurrency the way Cargo does would significantly reduce memory usage during bootstrap.

Is it the threads themselves that are consuming memory? Have you measured the actual task utilization or only the number of threads spawned?

Tangentially, I have found that if lld is being used as linker then it also runs multi-threaded by default even if rustc itself is told to use only 1 thread. See #81942. That's probably not a problem during bootstrap though since it won't do as much linking as the UI tests.

@Mark-Simulacrum
Copy link
Member

According to this related zulip discussion rustc could perhaps use a threadpool that only ramps up on demand. Then if compilers are starved for job tokens they won't need to start additional threads.

Yeah, in theory this is what we want but seems pretty clearly out of scope for this issue IMO; it's a hard challenge to get right, particularly given the blocking nature of the token acquisition we currently have to deal with.

@the8472
Copy link
Member

the8472 commented Feb 10, 2021

Results from profiling ./x.py -j1 test library/core/ and looking at the thread lanes during stage0 std and compiler artifact building:

rustc itself runs sequentialy as expected.

hotspot thread view, rustc

Only linking with lld utilizes all cores, but that doesn't happen for most dependencies. Doesn't apply if you're using a different linker.

hotspot thread view, ldd

@tgnottingham
Copy link
Contributor Author

tgnottingham commented Feb 10, 2021

Is it the threads themselves that are consuming memory?

It's that there are more LLVM modules in memory at once. The main thread tries to codegen CGUs to unoptimized LLVM modules a bit ahead of time to anticipate the needs of the workers (who run optimization passes on the modules). The way everything works out, the more workers that exist at once, the more LLVM modules can exist at once, between those being worked on, and those that are queued up ahead of time by the main thread. (This area could use a lot of improvement, by the way, but that's a different issue.)

If you can provide instructions to observe this or detect it somehow, then there may be more that we can do.

I'll try to come up with a way to demonstrate it, or convince myself that I was out of my mind when I first tested it.

@the8472 Can you tell me how you profiled that? Or is that just using -Z self-profile + crox?

@the8472
Copy link
Member

the8472 commented Feb 10, 2021

That's perf record + https://github.com/KDAB/hotspot

@tgnottingham
Copy link
Contributor Author

Okay, nothing to see here. :)

The build of the bootstrap binary itself doesn't use the -j flag, and I was getting my information from that stage of the build. The stages after that do respect the -j flag.

Thanks folks. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

No branches or pull requests

4 participants