-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x.py doesn't limit job concurrency effectively #81957
Comments
I'm not sure if there's anything we can do - to my knowledge, x.py isn't doing anything interesting here, we just pass -j down to Cargo. If you can provide instructions to observe this or detect it somehow, then there may be more that we can do. It's worth noting that I expect each rustc to start roughly the same number of threads as codegen units, it's just that they should all immediately stop waiting on a job server token. I might not be remembering this piece right though. |
According to this related zulip discussion rustc could perhaps use a threadpool that only ramps up on demand. Then if compilers are starved for job tokens they won't need to start additional threads.
Is it the threads themselves that are consuming memory? Have you measured the actual task utilization or only the number of threads spawned? Tangentially, I have found that if lld is being used as linker then it also runs multi-threaded by default even if rustc itself is told to use only 1 thread. See #81942. That's probably not a problem during bootstrap though since it won't do as much linking as the UI tests. |
Yeah, in theory this is what we want but seems pretty clearly out of scope for this issue IMO; it's a hard challenge to get right, particularly given the blocking nature of the token acquisition we currently have to deal with. |
Results from profiling
Only linking with |
It's that there are more LLVM modules in memory at once. The main thread tries to codegen CGUs to unoptimized LLVM modules a bit ahead of time to anticipate the needs of the workers (who run optimization passes on the modules). The way everything works out, the more workers that exist at once, the more LLVM modules can exist at once, between those being worked on, and those that are queued up ahead of time by the main thread. (This area could use a lot of improvement, by the way, but that's a different issue.)
I'll try to come up with a way to demonstrate it, or convince myself that I was out of my mind when I first tested it. @the8472 Can you tell me how you profiled that? Or is that just using |
That's |
Okay, nothing to see here. :) The build of the Thanks folks. Closing. |
tl;dr -- I believe Cargo
-jN
limits you toN
concurrent codegen workers, while x.py-jN
limits you toN * number of CPUs
concurrent codegen workers, which is rough for memory usage during bootstrap. E.g. on an 8 CPU system,-j4
would not limit you to 4, but to 32 concurrent codegen workers.With Cargo, my understanding is that
-j4
limits you not only to compiling at most four crates at once, but that it limits you to running at most four of certain types of jobs within rustc at once, across all rustc instances under that Cargo invocation (maybe across multiple concurrent Cargo invocations?). This relates to jobs whose concurrency is dictated by the jobserver mechanism, like codegen.(Actually, I'm not sure that's 100% correct. It's easy to see that
-j
limits the max number of codegen workers active within an individual rustc instance, but I don't know how to verify if it limits them across all instances. The number of rustc threads in general isn't limited by this from what I've seen.)But with x.py,
-j4
only seems to limit you to compiling at most four crates at once. I can see that each rustc instance is still able to have up to 8 codegen workers simultaneously, including the main thread (probably 8 because my system has 8 CPUs). But I'm not sure if all of the rustc instances are globally limited to 8 concurrent jobs, or if they're individually limited to 8, so that the global limit is effectively 32 concurrent jobs.I expect that changing this to limit concurrency the way Cargo does would significantly reduce memory usage during bootstrap. It would also reduce the amount of concurrency, unless you increase the
-j
setting, but it's possible that the extra concurrency was overkill.The text was updated successfully, but these errors were encountered: