Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1500-arm match statement causing a major build time and memory use spike, especially for wasm target #81757

Open
adrian17 opened this issue Feb 4, 2021 · 9 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-patterns Relating to patterns and pattern matching C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@adrian17
Copy link

adrian17 commented Feb 4, 2021

Tested on both stable 1.49 and current nightly.

The change in question: https://github.com/tafia/quick-xml/pull/239/files#diff-0acc5298e8580ddde57f322c2f6b70406f8d2b13acd0bebc4125428a66afc585

My reproduction:

git clone https://github.com/tafia/quick-xml.git
cd quick-xml
git checkout 1961d8e

Before this commit, cargo build --release took up to 5s irregardless of target.
With this commit:
cargo build --release takes 30-40s and 200MB more memory than before.
cargo build --release --target wasm32-unknown-unknown takes several minutes, and memory use exceeds 4GB.

@adrian17 adrian17 added the C-bug Category: This is a bug. label Feb 4, 2021
@camelid camelid added A-patterns Relating to patterns and pattern matching O-wasm Target: WASM (WebAssembly), http://webassembly.org/ I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. labels Feb 4, 2021
@camelid
Copy link
Member

camelid commented Feb 4, 2021

I wonder if LLVM is the source of the WASM slowdown.

@osa1
Copy link
Contributor

osa1 commented Feb 5, 2021

I tried to profile this, following https://rustc-dev-guide.rust-lang.org/profiling/with_perf.html

  • I first built the package in release mode for native and wasm32-unknown-unknown targets
  • Then generated perf data using touch src/lib.rs; perf record -F99 --call-graph dwarf cargo +stage1 rustc --release --target=...
  • Generated how much time was spent in LLVM-related stuff using perf focus '{llvm}'

Results:

Target time for touch src/lib.rs; cargo build --release perf focus '{llvm}' time outside llvm
native 14.26s 83% (~11s) ~3s
wasm 1m 15s (+61s) 95% (~71s) ~4s

Compiling to Wasm takes 61s longer, but the time spent outside of LLVM-related stuff is only 1s longer. So LLVM-related code takes 60s longer while the rest takes 1s longer.

It's possible that some of the overhead is due to LLVM code is rustc code rather than the LLVM library though.

@tgnottingham
Copy link
Contributor

tgnottingham commented Feb 6, 2021

Most of the time is spent in two LLVM optimization passes on the largest CGU. I don't think the CGU is particularly large, since it compiles to initial LLVM-IR pretty quickly. Seems like there's an LLVM performance bug. Excerpt of -Z time-llvm-passes:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 97.2059 seconds (97.2046 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  48.8551 ( 51.2%)   1.1879 ( 63.4%)  50.0430 ( 51.5%)  50.0445 ( 51.5%)  Dominator Tree Construction #134
  34.6618 ( 36.4%)   0.2081 ( 11.1%)  34.8698 ( 35.9%)  34.8715 ( 35.9%)  WebAssembly Register Stackify
   1.9735 (  2.1%)   0.0006 (  0.0%)   1.9741 (  2.0%)   1.9742 (  2.0%)  Global Value Numbering #9
   0.8529 (  0.9%)   0.1702 (  9.1%)   1.0231 (  1.1%)   1.0231 (  1.1%)  Dead Global Elimination #4
...

(FYI, -Z time-llvm-passes requires -C codegen-units=1 normally, but be warned--it exhausted my system memory and thrashed it to death. I had to hack around it with some local changes to rustc.)

In contrast, when compiling for an x86 target, Dominator Tree Construction passes are all sub-second, and of course WebAssembly Register Stackify isn't present.

@rustbot label A-LLVM

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Feb 6, 2021
@camelid camelid added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 6, 2021
@nikic
Copy link
Contributor

nikic commented Feb 6, 2021

Could you provide the ll/bc file for the problematic CGU?

@tgnottingham
Copy link
Contributor

tgnottingham commented Feb 6, 2021

@nikic Sure. Generated with cargo +nightly rustc --release --target wasm32-unknown-unknown -- -C no-prepopulate-passes -C passes=name-anon-globals --emit=llvm-ir,llvm-bc. And nightly is rustc 1.51.0-nightly (a2f8f6281 2021-01-27). Let me know if you had something different in mind.

quick_xml-7fe5361a8734566e.ll.gz
quick_xml-7fe5361a8734566e.bc.gz

Edit: oh, I think I did misunderstand. Give me a minute.

Okay, here's just the bc for the troublesome CGU. Generated with cargo +nightly rustc --release --target wasm32-unknown-unknown -- -C no-prepopulate-passes -C passes=name-anon-globals -C save-temps.

quick-xml-bc.tar.gz

@nikic
Copy link
Contributor

nikic commented Feb 7, 2021

Running llc, it looks like the majority of the time is spent inside ReachabilityGraph::calculate() as part of https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp.

The header lists a complexity of O(NumBlocks * NumNestedLoops * NumIrreducibleLoops + NumLoops * NumLoops). In this case there's one huge function with a lot of control flow...

@nikic
Copy link
Contributor

nikic commented Feb 7, 2021

For the test case, the largest reachability graph computed is on 26122 blocks, requiring 487893113 worklist iterations.

@tgnottingham
Copy link
Contributor

tgnottingham commented Feb 16, 2021

So we'd probably want to reduce the number of basic blocks generated for the match, and/or reduce the complexity of the optimization, if either are possible.

@adrian17, I don't know if you're looking for a workaround, but if so, moving the match into a separate function avoids the problem. The LLVM optimization only applies to code within a loop, so replacing the giant match with a function call reduces the problem size for LLVM. Doing this also cut the runtime for the x86 build in half, FWIW, making both the wasm and x86 builds take 10s on my machine.

@apiraino
Copy link
Contributor

Removing the I-prioritize label (zulip discussion here)

(Feel free to follow-up, though)

@rustbot label -I-prioritize

@rustbot rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 25, 2021
@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-patterns Relating to patterns and pattern matching C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-compiletime Issue: Problems and improvements with respect to compile times. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants