1500-arm match statement causing a major build time and memory use spike, especially for wasm target #81757

adrian17 · 2021-02-04T18:41:45Z

Tested on both stable 1.49 and current nightly.

The change in question: https://github.com/tafia/quick-xml/pull/239/files#diff-0acc5298e8580ddde57f322c2f6b70406f8d2b13acd0bebc4125428a66afc585

My reproduction:

git clone https://github.com/tafia/quick-xml.git
cd quick-xml
git checkout 1961d8e

Before this commit, cargo build --release took up to 5s irregardless of target.
With this commit:
cargo build --release takes 30-40s and 200MB more memory than before.
cargo build --release --target wasm32-unknown-unknown takes several minutes, and memory use exceeds 4GB.

The text was updated successfully, but these errors were encountered:

camelid · 2021-02-04T23:01:11Z

I wonder if LLVM is the source of the WASM slowdown.

osa1 · 2021-02-05T08:20:07Z

I tried to profile this, following https://rustc-dev-guide.rust-lang.org/profiling/with_perf.html

I first built the package in release mode for native and wasm32-unknown-unknown targets
Then generated perf data using touch src/lib.rs; perf record -F99 --call-graph dwarf cargo +stage1 rustc --release --target=...
Generated how much time was spent in LLVM-related stuff using perf focus '{llvm}'

Results:

Target	time for `touch src/lib.rs; cargo build --release`	`perf focus '{llvm}'`	time outside llvm
native	14.26s	83% (~11s)	~3s
wasm	1m 15s (+61s)	95% (~71s)	~4s

Compiling to Wasm takes 61s longer, but the time spent outside of LLVM-related stuff is only 1s longer. So LLVM-related code takes 60s longer while the rest takes 1s longer.

It's possible that some of the overhead is due to LLVM code is rustc code rather than the LLVM library though.

tgnottingham · 2021-02-06T03:51:03Z

Most of the time is spent in two LLVM optimization passes on the largest CGU. I don't think the CGU is particularly large, since it compiles to initial LLVM-IR pretty quickly. Seems like there's an LLVM performance bug. Excerpt of -Z time-llvm-passes:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 97.2059 seconds (97.2046 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  48.8551 ( 51.2%)   1.1879 ( 63.4%)  50.0430 ( 51.5%)  50.0445 ( 51.5%)  Dominator Tree Construction #134
  34.6618 ( 36.4%)   0.2081 ( 11.1%)  34.8698 ( 35.9%)  34.8715 ( 35.9%)  WebAssembly Register Stackify
   1.9735 (  2.1%)   0.0006 (  0.0%)   1.9741 (  2.0%)   1.9742 (  2.0%)  Global Value Numbering #9
   0.8529 (  0.9%)   0.1702 (  9.1%)   1.0231 (  1.1%)   1.0231 (  1.1%)  Dead Global Elimination #4
...

(FYI, -Z time-llvm-passes requires -C codegen-units=1 normally, but be warned--it exhausted my system memory and thrashed it to death. I had to hack around it with some local changes to rustc.)

In contrast, when compiling for an x86 target, Dominator Tree Construction passes are all sub-second, and of course WebAssembly Register Stackify isn't present.

@rustbot label A-LLVM

nikic · 2021-02-06T17:17:47Z

Could you provide the ll/bc file for the problematic CGU?

tgnottingham · 2021-02-06T22:26:37Z

@nikic Sure. Generated with cargo +nightly rustc --release --target wasm32-unknown-unknown -- -C no-prepopulate-passes -C passes=name-anon-globals --emit=llvm-ir,llvm-bc. And nightly is rustc 1.51.0-nightly (a2f8f6281 2021-01-27). Let me know if you had something different in mind.

quick_xml-7fe5361a8734566e.ll.gz
quick_xml-7fe5361a8734566e.bc.gz

Edit: oh, I think I did misunderstand. Give me a minute.

Okay, here's just the bc for the troublesome CGU. Generated with cargo +nightly rustc --release --target wasm32-unknown-unknown -- -C no-prepopulate-passes -C passes=name-anon-globals -C save-temps.

quick-xml-bc.tar.gz

nikic · 2021-02-07T20:47:26Z

Running llc, it looks like the majority of the time is spent inside ReachabilityGraph::calculate() as part of https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp.

The header lists a complexity of O(NumBlocks * NumNestedLoops * NumIrreducibleLoops + NumLoops * NumLoops). In this case there's one huge function with a lot of control flow...

nikic · 2021-02-07T21:13:23Z

For the test case, the largest reachability graph computed is on 26122 blocks, requiring 487893113 worklist iterations.

tgnottingham · 2021-02-16T04:23:49Z

So we'd probably want to reduce the number of basic blocks generated for the match, and/or reduce the complexity of the optimization, if either are possible.

@adrian17, I don't know if you're looking for a workaround, but if so, moving the match into a separate function avoids the problem. The LLVM optimization only applies to code within a loop, so replacing the giant match with a function call reduces the problem size for LLVM. Doing this also cut the runtime for the x86 build in half, FWIW, making both the wasm and x86 builds take 10s on my machine.

apiraino · 2021-02-25T12:29:43Z

Removing the I-prioritize label (zulip discussion here)

(Feel free to follow-up, though)

@rustbot label -I-prioritize

adrian17 added the C-bug Category: This is a bug. label Feb 4, 2021

adrian17 mentioned this issue Feb 4, 2021

Builds of 0.21 are much slower, wasm builds cause massive memory use in rustc tafia/quick-xml#255

Closed

rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Feb 6, 2021

camelid added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 6, 2021

rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 25, 2021

tgnottingham mentioned this issue Apr 3, 2021

Optimize codegen scheduling for memory usage and compile time #82685

Closed

Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1500-arm match statement causing a major build time and memory use spike, especially for wasm target #81757

1500-arm match statement causing a major build time and memory use spike, especially for wasm target #81757

adrian17 commented Feb 4, 2021

camelid commented Feb 4, 2021

osa1 commented Feb 5, 2021

tgnottingham commented Feb 6, 2021 •

edited

Loading

nikic commented Feb 6, 2021

tgnottingham commented Feb 6, 2021 •

edited

Loading

nikic commented Feb 7, 2021

nikic commented Feb 7, 2021 •

edited

Loading

tgnottingham commented Feb 16, 2021 •

edited

Loading

apiraino commented Feb 25, 2021

1500-arm match statement causing a major build time and memory use spike, especially for wasm target #81757

1500-arm match statement causing a major build time and memory use spike, especially for wasm target #81757

Comments

adrian17 commented Feb 4, 2021

camelid commented Feb 4, 2021

osa1 commented Feb 5, 2021

tgnottingham commented Feb 6, 2021 • edited Loading

nikic commented Feb 6, 2021

tgnottingham commented Feb 6, 2021 • edited Loading

nikic commented Feb 7, 2021

nikic commented Feb 7, 2021 • edited Loading

tgnottingham commented Feb 16, 2021 • edited Loading

apiraino commented Feb 25, 2021

tgnottingham commented Feb 6, 2021 •

edited

Loading

tgnottingham commented Feb 6, 2021 •

edited

Loading

nikic commented Feb 7, 2021 •

edited

Loading

tgnottingham commented Feb 16, 2021 •

edited

Loading