x.py doesn't limit job concurrency effectively #81957

tgnottingham · 2021-02-10T07:39:03Z

tl;dr -- I believe Cargo -jN limits you to N concurrent codegen workers, while x.py -jN limits you to N * number of CPUs concurrent codegen workers, which is rough for memory usage during bootstrap. E.g. on an 8 CPU system, -j4 would not limit you to 4, but to 32 concurrent codegen workers.

With Cargo, my understanding is that -j4 limits you not only to compiling at most four crates at once, but that it limits you to running at most four of certain types of jobs within rustc at once, across all rustc instances under that Cargo invocation (maybe across multiple concurrent Cargo invocations?). This relates to jobs whose concurrency is dictated by the jobserver mechanism, like codegen.

(Actually, I'm not sure that's 100% correct. It's easy to see that -j limits the max number of codegen workers active within an individual rustc instance, but I don't know how to verify if it limits them across all instances. The number of rustc threads in general isn't limited by this from what I've seen.)

But with x.py, -j4 only seems to limit you to compiling at most four crates at once. I can see that each rustc instance is still able to have up to 8 codegen workers simultaneously, including the main thread (probably 8 because my system has 8 CPUs). But I'm not sure if all of the rustc instances are globally limited to 8 concurrent jobs, or if they're individually limited to 8, so that the global limit is effectively 32 concurrent jobs.

I expect that changing this to limit concurrency the way Cargo does would significantly reduce memory usage during bootstrap. It would also reduce the amount of concurrency, unless you increase the -j setting, but it's possible that the extra concurrency was overkill.

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2021-02-10T12:12:28Z

I'm not sure if there's anything we can do - to my knowledge, x.py isn't doing anything interesting here, we just pass -j down to Cargo.

If you can provide instructions to observe this or detect it somehow, then there may be more that we can do. It's worth noting that I expect each rustc to start roughly the same number of threads as codegen units, it's just that they should all immediately stop waiting on a job server token. I might not be remembering this piece right though.

the8472 · 2021-02-10T18:22:37Z

If you can provide instructions to observe this or detect it somehow, then there may be more that we can do. It's worth noting that I expect each rustc to start roughly the same number of threads as codegen units, it's just that they should all immediately stop waiting on a job server token. I might not be remembering this piece right though.

According to this related zulip discussion rustc could perhaps use a threadpool that only ramps up on demand. Then if compilers are starved for job tokens they won't need to start additional threads.

I expect that changing this to limit concurrency the way Cargo does would significantly reduce memory usage during bootstrap.

Is it the threads themselves that are consuming memory? Have you measured the actual task utilization or only the number of threads spawned?

Tangentially, I have found that if lld is being used as linker then it also runs multi-threaded by default even if rustc itself is told to use only 1 thread. See #81942. That's probably not a problem during bootstrap though since it won't do as much linking as the UI tests.

Mark-Simulacrum · 2021-02-10T18:24:41Z

According to this related zulip discussion rustc could perhaps use a threadpool that only ramps up on demand. Then if compilers are starved for job tokens they won't need to start additional threads.

Yeah, in theory this is what we want but seems pretty clearly out of scope for this issue IMO; it's a hard challenge to get right, particularly given the blocking nature of the token acquisition we currently have to deal with.

the8472 · 2021-02-10T21:54:51Z

Results from profiling ./x.py -j1 test library/core/ and looking at the thread lanes during stage0 std and compiler artifact building:

rustc itself runs sequentialy as expected.

Only linking with lld utilizes all cores, but that doesn't happen for most dependencies. Doesn't apply if you're using a different linker.

tgnottingham · 2021-02-10T23:21:35Z

Is it the threads themselves that are consuming memory?

It's that there are more LLVM modules in memory at once. The main thread tries to codegen CGUs to unoptimized LLVM modules a bit ahead of time to anticipate the needs of the workers (who run optimization passes on the modules). The way everything works out, the more workers that exist at once, the more LLVM modules can exist at once, between those being worked on, and those that are queued up ahead of time by the main thread. (This area could use a lot of improvement, by the way, but that's a different issue.)

If you can provide instructions to observe this or detect it somehow, then there may be more that we can do.

I'll try to come up with a way to demonstrate it, or convince myself that I was out of my mind when I first tested it.

@the8472 Can you tell me how you profiled that? Or is that just using -Z self-profile + crox?

the8472 · 2021-02-10T23:32:37Z

That's perf record + https://github.com/KDAB/hotspot

tgnottingham · 2021-02-10T23:51:00Z

Okay, nothing to see here. :)

The build of the bootstrap binary itself doesn't use the -j flag, and I was getting my information from that stage of the build. The stages after that do respect the -j flag.

Thanks folks. Closing.

jonas-schievink added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) C-bug Category: This is a bug. labels Feb 10, 2021

tgnottingham closed this as completed Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x.py doesn't limit job concurrency effectively #81957

x.py doesn't limit job concurrency effectively #81957

tgnottingham commented Feb 10, 2021 •

edited

Loading

Mark-Simulacrum commented Feb 10, 2021

the8472 commented Feb 10, 2021

Mark-Simulacrum commented Feb 10, 2021

the8472 commented Feb 10, 2021 •

edited

Loading

tgnottingham commented Feb 10, 2021 •

edited

Loading

the8472 commented Feb 10, 2021

tgnottingham commented Feb 10, 2021

x.py doesn't limit job concurrency effectively #81957

x.py doesn't limit job concurrency effectively #81957

Comments

tgnottingham commented Feb 10, 2021 • edited Loading

Mark-Simulacrum commented Feb 10, 2021

the8472 commented Feb 10, 2021

Mark-Simulacrum commented Feb 10, 2021

the8472 commented Feb 10, 2021 • edited Loading

tgnottingham commented Feb 10, 2021 • edited Loading

the8472 commented Feb 10, 2021

tgnottingham commented Feb 10, 2021

tgnottingham commented Feb 10, 2021 •

edited

Loading

the8472 commented Feb 10, 2021 •

edited

Loading

tgnottingham commented Feb 10, 2021 •

edited

Loading