-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include terminators in instance size estimate #86777
Conversation
For example, drop glue generated for struct below, doesn't have any statements, only terminators. Previously it received an estimate of 0, the new estimate is 13 (6+5 drop terminators, +1 resume, +1 return). struct S { a: String, b: String, c: String, d: String, e: String, f: String, } Originally reported in rust-lang#69382 (comment)
r? @estebank (rust-highfive has picked a reviewer for you, use r? to override) |
@bors r+ rollup |
📌 Commit 5621987 has been approved by |
…ebank Include terminators in instance size estimate For example, drop glue generated for struct below, doesn't have any statements, only terminators. Previously it received an estimate of 0, the new estimate is 13 (6+5 drop terminators, +1 resume, +1 return). ```rust struct S { a: String, b: String, c: String, d: String, e: String, f: String, } ``` Originally reported in rust-lang#69382 (comment)
We should probably not roll this up as this can affect CGU partitioning and show up on perf.rlo. @bors rollup=never |
☀️ Test successful - checks-actions |
This change led to moderate performance regressions along with some performance gains. For changes that impact CGU partitioning, we should probably be in the habit of running performance tests. This seemed to impact the As part of the performance triage process, I'm marking this as a performance regression. If you believe this performance regression to be justifiable or once you have an issue or PR that addresses this regression, please mark this PR with the label @rustbot label +perf-regression |
The biggest impact is the regex-debug, full, wall-time measurement regressing by 20%. That has been persistent for a few weeks now, it's not spurious. |
Agreed that the regression seems genuine. That said, it's a really weird regression, as instruction counts and cycles:u went down. Not really sure how to investigate it further, given that... it also seems like the only benchmark that did that after this PR. Going to think a little more on this. |
Maybe -j1 isn't making things entirely sequential? So if there was some parallel work going on that now becomes more sequential due to CGUs being shuffled around then this would show up as trading a decrease in cycles (less concurrency overhead) for increased wall time. Maybe perf should be changed to run the benchmarks confined to a single core and unwanted threading would show up as more context switches. |
There's no setting of -j1 for benchmarks on perf today, LLVM is definitely parallel on most of them. |
I assumed it would use -j1 to reduce noise. Well, that makes it even easier then, CGU partitioning likely got worse in that case. |
You're correct. It took me way too long to get to this point -- we should figure out a way to draw this directly on perf.r-l.o and with a nicer comparison view -- but, see the optimize + codegen threads below. These aren't entirely to scale -- the bottom one takes 2x as long, roughly -- but the important point is that the long tail CGU there is scheduled way later than before, so even though the module sizes themselves haven't changed that much from run to run (in fact, on average they're more evenly distributed), that ends up really hurting overall end-to-end compilation time. Regardless, this is just a scheduling issue, and is not directly actionable at this time. Better estimates -- like this PR -- are probably still good, even if they hurt us temporarily, particularly when it's limited to just one benchmark regressing in practice (with new significance algorithm we're much better at hiding insignificant noise - https://perf.rust-lang.org/compare.html?calcNewSig=true&start=56dee7c49ecdec4c2c9eccc6ff966cf58847bda6&end=7a9ff746fe20a38a3adc0ac65e1789f6e4b099ad&stat=wall-time) @rustbot label +perf-regression-triaged Edit: for context, these graphs are taken from the crox downloads of self-profile data and then screenshotted for combining in gimp. |
For example, drop glue generated for struct below, doesn't have any
statements, only terminators. Previously it received an estimate of 0,
the new estimate is 13 (6+5 drop terminators, +1 resume, +1 return).
Originally reported in #69382 (comment)