-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1500-arm match statement causing a major build time and memory use spike, especially for wasm target #81757
Comments
I wonder if LLVM is the source of the WASM slowdown. |
I tried to profile this, following https://rustc-dev-guide.rust-lang.org/profiling/with_perf.html
Results:
Compiling to Wasm takes 61s longer, but the time spent outside of LLVM-related stuff is only 1s longer. So LLVM-related code takes 60s longer while the rest takes 1s longer. It's possible that some of the overhead is due to LLVM code is rustc code rather than the LLVM library though. |
Most of the time is spent in two LLVM optimization passes on the largest CGU. I don't think the CGU is particularly large, since it compiles to initial LLVM-IR pretty quickly. Seems like there's an LLVM performance bug. Excerpt of
(FYI, In contrast, when compiling for an x86 target, @rustbot label A-LLVM |
Could you provide the ll/bc file for the problematic CGU? |
@nikic Sure. Generated with quick_xml-7fe5361a8734566e.ll.gz Edit: oh, I think I did misunderstand. Give me a minute. Okay, here's just the bc for the troublesome CGU. Generated with |
Running llc, it looks like the majority of the time is spent inside The header lists a complexity of |
For the test case, the largest reachability graph computed is on 26122 blocks, requiring 487893113 worklist iterations. |
So we'd probably want to reduce the number of basic blocks generated for the match, and/or reduce the complexity of the optimization, if either are possible. @adrian17, I don't know if you're looking for a workaround, but if so, moving the match into a separate function avoids the problem. The LLVM optimization only applies to code within a loop, so replacing the giant match with a function call reduces the problem size for LLVM. Doing this also cut the runtime for the x86 build in half, FWIW, making both the wasm and x86 builds take 10s on my machine. |
Tested on both stable 1.49 and current nightly.
The change in question: https://github.com/tafia/quick-xml/pull/239/files#diff-0acc5298e8580ddde57f322c2f6b70406f8d2b13acd0bebc4125428a66afc585
My reproduction:
Before this commit,
cargo build --release
took up to 5s irregardless of target.With this commit:
cargo build --release
takes 30-40s and 200MB more memory than before.cargo build --release --target wasm32-unknown-unknown
takes several minutes, and memory use exceeds 4GB.The text was updated successfully, but these errors were encountered: