Large files containing many tokens of `const` data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Manishearth · 2024-12-17T02:16:48Z

ICU4X has a concept of "baked data", a way of "baking" locale data into the source of a program in the form of consts. This has a bunch of performance benefits: loading data from the binary is essentially free and doesn't involve any sort of deserialization.

However, we have been facing issues with cases where a single crate contains a lot of data.

I have a minimal testcase here: https://github.com/Manishearth/icu4x_compile_sample. It removes most of the cruft whilst still having an interesting-enough AST in the const data. cargo build in the demo folder takes 51s, using almost a gigabyte of RAM. Removing the macro does improve things slightly, but not overly slow.

Some interesting snippets of time-passes:

...
time:   1.194; rss:   52MB ->  595MB ( +543MB)	expand_crate
time:   1.194; rss:   52MB ->  595MB ( +543MB)	macro_expand_crate
...
time:   3.720; rss:  682MB ->  837MB ( +155MB)	type_check_crate
...
time:  55.505; rss:  837MB -> 1058MB ( +221MB)	MIR_borrow_checking
...
time:   0.124; rss: 1080MB ->  624MB ( -456MB)	free_global_ctxt

Full time-passes

time:   0.001; rss:   47MB ->   49MB (   +1MB)	parse_crate
time:   0.001; rss:   50MB ->   50MB (   +0MB)	incr_comp_prepare_session_directory
time:   0.000; rss:   50MB ->   51MB (   +1MB)	setup_global_ctxt
time:   0.000; rss:   52MB ->   52MB (   +0MB)	crate_injection
time:   1.194; rss:   52MB ->  595MB ( +543MB)	expand_crate
time:   1.194; rss:   52MB ->  595MB ( +543MB)	macro_expand_crate
time:   0.013; rss:  595MB ->  595MB (   +0MB)	AST_validation
time:   0.008; rss:  595MB ->  597MB (   +1MB)	finalize_macro_resolutions
time:   0.285; rss:  597MB ->  642MB (  +45MB)	late_resolve_crate
time:   0.012; rss:  642MB ->  642MB (   +0MB)	resolve_check_unused
time:   0.020; rss:  642MB ->  642MB (   +0MB)	resolve_postprocess
time:   0.326; rss:  595MB ->  642MB (  +46MB)	resolve_crate
time:   0.011; rss:  610MB ->  610MB (   +0MB)	write_dep_info
time:   0.011; rss:  610MB ->  611MB (   +0MB)	complete_gated_feature_checking
time:   0.058; rss:  765MB ->  729MB (  -35MB)	drop_ast
time:   1.213; rss:  610MB ->  681MB (  +71MB)	looking_for_derive_registrar
time:   1.421; rss:  610MB ->  682MB (  +72MB)	misc_checking_1
time:   0.086; rss:  682MB ->  690MB (   +8MB)	coherence_checking
time:   3.720; rss:  682MB ->  837MB ( +155MB)	type_check_crate
time:   0.000; rss:  837MB ->  837MB (   +0MB)	MIR_coroutine_by_move_body
time:  55.505; rss:  837MB -> 1058MB ( +221MB)	MIR_borrow_checking
time:   1.571; rss: 1058MB -> 1068MB (  +10MB)	MIR_effect_checking
time:   0.217; rss: 1068MB -> 1067MB (   -1MB)	module_lints
time:   0.217; rss: 1068MB -> 1067MB (   -1MB)	lint_checking
time:   0.311; rss: 1067MB -> 1068MB (   +0MB)	privacy_checking_modules
time:   0.607; rss: 1068MB -> 1068MB (   +0MB)	misc_checking_3
time:   0.000; rss: 1136MB -> 1137MB (   +1MB)	monomorphization_collector_graph_walk
time:   0.778; rss: 1068MB -> 1064MB (   -4MB)	generate_crate_metadata
time:   0.005; rss: 1064MB -> 1085MB (  +22MB)	codegen_to_LLVM_IR
time:   0.007; rss: 1076MB -> 1085MB (  +10MB)	LLVM_passes
time:   0.014; rss: 1064MB -> 1085MB (  +22MB)	codegen_crate
time:   0.257; rss: 1084MB -> 1080MB (   -4MB)	encode_query_results
time:   0.270; rss: 1084MB -> 1080MB (   -4MB)	incr_comp_serialize_result_cache
time:   0.270; rss: 1084MB -> 1080MB (   -4MB)	incr_comp_persist_result_cache
time:   0.271; rss: 1084MB -> 1080MB (   -4MB)	serialize_dep_graph
time:   0.124; rss: 1080MB ->  624MB ( -456MB)	free_global_ctxt
time:   0.000; rss:  624MB ->  624MB (   +0MB)	finish_ongoing_codegen
time:   0.127; rss:  624MB ->  653MB (  +29MB)	link_rlib
time:   0.135; rss:  624MB ->  653MB (  +29MB)	link_binary
time:   0.138; rss:  624MB ->  618MB (   -6MB)	link_crate
time:   0.139; rss:  624MB ->  618MB (   -6MB)	link
time:  65.803; rss:   32MB ->  187MB ( +155MB)	total

Even without the intermediate macro, expand_crate still increases RAM significantly, though the increase is halved:

time:   0.715; rss:   52MB ->  254MB ( +201MB)	expand_crate
time:   0.715; rss:   52MB ->  254MB ( +201MB)	macro_expand_crate

I understand that to some extent, we are simply feeding Rust a file that is megabytes in size and we cannot expect it to be too fast. It's interesting that MIR borrow checking is slowed down so much by this (there's relatively little to borrow check. I suspect there is MIR construction happening here too). The fact that the RAM usage is almost in the gigabytes is also somewhat concerning; the problematic source file is 7MB, but compilation takes a gigabyte of RAM, which is quite significant. Pair this with the fact that we have many such data files per crate (some of which are large) we end up hitting CI limits.

With the actual problem we were facing (unicode-org/icu4x#5230 (comment)), our time-passes numbers were:

...
time:   1.013; rss:   51MB -> 1182MB (+1130MB)	expand_crate
time:   1.013; rss:   51MB -> 1182MB (+1131MB)	macro_expand_crate
...
time:   6.609; rss: 1308MB -> 1437MB ( +128MB)	type_check_crate
time:  36.802; rss: 1437MB -> 2248MB ( +811MB)	MIR_borrow_checking
time:   2.214; rss: 2248MB -> 2270MB (  +22MB)	MIR_effect_checking
...

I'm hoping there is at least some low hanging fruit that can be improved here, or advice on how to avoid this problem. So far we've managed to stay within CI limits by reducing the number of tokens, converting stuff like icu::experimental::dimension::provider::units::UnitsDisplayNameV1 { patterns: icu::experimental::relativetime::provider::PluralPatterns { strings: icu::plurals::provider::PluralElementsPackedCow { elements: alloc::borrow::Cow::Borrowed(unsafe { icu::plurals::provider::PluralElementsPackedULE::from_byte_slice_unchecked(b"\0\x01 acre") }) }, _phantom: core::marker::PhantomData } }, into icu::experimental::dimension::provider::units::UnitsDisplayNameV1::new_baked(b"\0\x01 acre"). This works to some extent but the problems remain in the same order of magnitude and can recur as we add more data.

The text was updated successfully, but these errors were encountered:

oli-obk · 2024-12-17T07:39:49Z

It's interesting that MIR borrow checking is slowed down so much by this (there's relatively little to borrow check.

Due to the query system this can also be const eval being invoked and generating and interning lots of allocations. So #93215 may be related

I don't remember how to get the diff or single output that's used to generate the table in https://perf.rust-lang.org/detailed-query.html?commit=52f4785f80c1516ebece019ae4b69763ffb9a618&benchmark=ripgrep-13.0.0-opt&scenario=incr-unchanged&base_commit=5afd5ad29c014de69bea61d028a1ce832ed75a75 but that gives per query timings. Just in case you want to dig some more

lqd · 2024-12-17T08:27:38Z

You can get the raw data with -Zself-profile and use measureme tools on the .mm_profdata raw data with summarize summarize $file for the query summary, and iirc summarize diff with 2 files to get the diff oli linked.

lqd · 2024-12-17T14:42:54Z

+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                                                    | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_const_qualif                                                        | 16.22s    | 45.785          | 16.22s   | 3          | 4.56µs                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_built                                                               | 9.98s     | 28.160          | 10.09s   | 4          | 96.73ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_borrowck                                                            | 2.49s     | 7.030           | 19.02s   | 4          | 5.70µs                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| typeck                                                                  | 2.27s     | 6.420           | 2.36s    | 4          | 23.85ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| expand_crate                                                            | 942.46ms  | 2.660           | 952.89ms | 1          | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ 
...
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
Total cpu time: 35.430183553s
...

And without the macro:

+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                                                    | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_const_qualif                                                        | 14.88s    | 45.361          | 14.88s   | 3          | 5.76µs                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_built                                                               | 9.86s     | 30.074          | 9.97s    | 4          | 61.53ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_borrowck                                                            | 2.50s     | 7.612           | 17.67s   | 4          | 7.64µs                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| typeck                                                                  | 2.27s     | 6.922           | 2.35s    | 4          | 23.99ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| expand_crate                                                            | 454.17ms  | 1.385           | 477.20ms | 1          | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
...
Total cpu time: 32.792896135s
...

So most of the time is in mir building and const qualif, not in borrowck per se.

oli-obk · 2024-12-17T14:48:56Z

Hmm. There's def opportunity to improve const qualification

cc @RalfJung

Manishearth · 2024-12-17T16:46:12Z

Sweet, thanks! If it's something straightforward enough I'm happy to help out too.

Would any of these fixes help with the RAM? The performance is an issue but it's not the blocker, the RAM usage is one that actively limits us at times.

oli-obk · 2024-12-17T17:43:10Z

Hmm.. that's harder to debug. The mir const qualif query returns an always-tiny result, so that's not it. We could probably dump size changes of queries without nested queries, similar to the self time. But that's harder

We may be able to trim the borrowck result if that's the issue. Rustc may not need all the output anymore

lqd · 2024-12-17T17:44:25Z

Would any of these fixes help with the RAM?

You may be in luck as that RAM usage seems to come from const qualif as well 😅. It's understandable, really, it's trying to do analyses on MIR that has 250K locals and 730K statements.

oli-obk · 2024-12-17T17:45:31Z

Ah, transient peak memory? It shouldn't generate much persistent data

lqd · 2024-12-17T17:52:01Z

Seems like it, yeah, since we're talking about max-rss.

And I wonder, if you remember the recent change in dataflow removing the acyclic MIR fast path (which seems a likely shape for a constant), if the xfer function is now cloning the state (I hope it still moves it if there's a single successor; but at the same time there may be unwinding paths all around) because there are 200K blocks. I'd need to check.

Also we should try the mixed bitsets (but I haven't checked the distribution of values here, I'd believe they should be very compressible tho).

RalfJung · 2024-12-17T18:05:33Z

There's def opportunity to improve const qualification

There probably is, but it's hard to say where without knowing which part is the bottleneck. Is there any way to get a profile of where const-qualif is spending its time?

We do have to iterate over that array at least once. But I doubt that alone would be so slow -- how many elements does this array have?

Maybe there's something accidentally quadratic in promotion? Not sure if that is also part of mir_const_qualif.

Manishearth · 2024-12-17T18:11:10Z

We do have to iterate over that array at least once. But I doubt that alone would be so slow -- how many elements does this array have?

Approximately 25,000. Each element looks the same, a nested struct constructor expression with a string at the center. The string can vary in length.

lqd · 2024-12-17T21:40:17Z

Also we should try the mixed bitsets

Manishearth/icu4x_compile_sample> hyperfine -w2 -r5 --prepare "cargo clean" -L rustc 1d35638dc38dbfbf1cc2a9823135dfcf3c650169,8280a60cd16db5b22b55e7c7d6b9f2d8a3960a20 "cargo +{rustc} check"
Benchmark 1: cargo +1d35638dc38dbfbf1cc2a9823135dfcf3c650169 check
  Time (mean ± σ):     34.761 s ±  0.464 s    [User: 21.684 s, System: 13.076 s]
  Range (min … max):   34.010 s … 35.210 s    5 runs

Benchmark 2: cargo +8280a60cd16db5b22b55e7c7d6b9f2d8a3960a20 check
  Time (mean ± σ):     20.605 s ±  0.049 s    [User: 18.423 s, System: 2.183 s]
  Range (min … max):   20.533 s … 20.657 s    5 runs

Summary
  cargo +8280a60cd16db5b22b55e7c7d6b9f2d8a3960a20 check ran
    1.69 ± 0.02 times faster than cargo +1d35638dc38dbfbf1cc2a9823135dfcf3c650169 check

> summarize summarize profiles/demo2-2187676.mm_profdata | grep mir_const_qualif
| mir_const_qualif    | 14.79s    | 43.514          | 14.79s   | 3          | 4.61µs
> summarize summarize profiles/demo2-2187708.mm_profdata | grep mir_const_qualif
| mir_const_qualif    | 1.00s     | 4.931           | 1.00s    | 3          | 3.94µs

(BTW, nightly uses a bit more than 1GB)

> cargo clean -q && /usr/bin/time -v cargo +1d35638dc38dbfbf1cc2a9823135dfcf3c650169 check -q 2>&1 | grep "Maximum"
        Maximum resident set size (kbytes): 14821512
> cargo clean -q && /usr/bin/time -v cargo +8280a60cd16db5b22b55e7c7d6b9f2d8a3960a20 check -q 2>&1 | grep "Maximum"
        Maximum resident set size (kbytes): 1871272

We'll see how much it impacts regular CFGs when the perf run concludes. Worst case we tune the cutoff between dense and sparse bitsets...

lqd · 2024-12-17T22:49:24Z

@Manishearth as I'm not sure whether the CI limits you're hitting are only on max-rss or another metric, let us know how #134438 works for you when it lands in a nightly.

Manishearth · 2024-12-17T23:16:42Z

I don't know either, but I will observe with that nightly! We're currently not hitting limits due to some optimizations we made (reducing tokens in the const code). When we were hitting limits it was intermittent CI stuff. So for me to measure this we'd need to

move our CI to nightly
undo the optimizations
wait a few days

which I may not actually do. But I think max-rss is probably correct.

nnethercote · 2024-12-18T01:40:58Z

And I wonder, if you remember the recent change in dataflow removing the acyclic MIR fast path (which seems a likely shape for a constant), if the xfer function is now cloning the state (I hope it still moves it if there's a single successor; but at the same time there may be unwinding paths all around) because there are 200K blocks. I'd need to check.

I assume this is referring to #131481? I think the same cloning would happen before/after that change, though I could be wrong. That PR merged in 1.84.0 (currently beta) so it would be easy to check, assuming ICU4X can be built with stable.

Manishearth · 2024-12-18T01:50:35Z

ICU4X does build on stable, as does the reduced testcase. What versions do you want me to compare?

Manishearth · 2024-12-18T01:55:43Z

Let me just compare stable and beta with RUSTC_BOOTSTRAP=1 to use time-passes

Manishearth · 2024-12-18T02:01:47Z

Tiny bit faster, takes up a bit more memory during expansion, a bunch more memory during typechecking but a bunch less memory during MIR. Tested with the reduced testcase, not ICU4X itself.

Stable:
time:   1.173; rss:   53MB ->  601MB ( +548MB)	expand_crate
time:   1.173; rss:   53MB ->  601MB ( +548MB)	macro_expand_crate
time:   3.917; rss:  659MB ->  776MB ( +117MB)	type_check_crate
time:  49.468; rss:  776MB -> 1098MB ( +322MB)	MIR_borrow_checking

Beta:
time:   1.243; rss:   51MB ->  601MB ( +550MB)	expand_crate
time:   1.244; rss:   51MB ->  601MB ( +551MB)	macro_expand_crate
time:   3.831; rss:  637MB ->  815MB ( +178MB)	type_check_crate
time:  47.344; rss:  815MB -> 1089MB ( +274MB)	MIR_borrow_checking

Stable (1.83)

[17:54:56] मanishearth@manishearth-glaptop2 ~/dev/Git/icu4x_compile_sample ^_^ 
$ cargo clean; RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Ztime-passes" /usr/bin/time -v cargo +stable build -j1 --all-features
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
     Removed 18 files, 104.6MiB total
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
   Compiling demo2 v0.1.0 (/home/manishearth/dev/Git/icu4x_compile_sample/demo)
time:   0.001; rss:   49MB ->   50MB (   +1MB)	parse_crate
time:   0.001; rss:   51MB ->   51MB (   +0MB)	incr_comp_prepare_session_directory
time:   0.000; rss:   51MB ->   52MB (   +1MB)	setup_global_ctxt
time:   0.000; rss:   53MB ->   53MB (   +0MB)	crate_injection
time:   1.173; rss:   53MB ->  601MB ( +548MB)	expand_crate
time:   1.173; rss:   53MB ->  601MB ( +548MB)	macro_expand_crate
time:   0.014; rss:  601MB ->  601MB (   +0MB)	AST_validation
time:   0.001; rss:  601MB ->  603MB (   +2MB)	finalize_macro_resolutions
time:   0.287; rss:  603MB ->  650MB (  +47MB)	late_resolve_crate
time:   0.012; rss:  650MB ->  650MB (   +0MB)	resolve_check_unused
time:   0.023; rss:  650MB ->  651MB (   +0MB)	resolve_postprocess
time:   0.323; rss:  601MB ->  651MB (  +49MB)	resolve_crate
time:   0.015; rss:  621MB ->  621MB (   +0MB)	write_dep_info
time:   0.014; rss:  622MB ->  622MB (   +0MB)	complete_gated_feature_checking
time:   0.058; rss:  757MB ->  693MB (  -64MB)	drop_ast
time:   1.299; rss:  621MB ->  657MB (  +36MB)	looking_for_derive_registrar
time:   1.495; rss:  621MB ->  659MB (  +37MB)	misc_checking_1
time:   0.075; rss:  659MB ->  666MB (   +7MB)	coherence_checking
time:   3.917; rss:  659MB ->  776MB ( +117MB)	type_check_crate
time:  49.468; rss:  776MB -> 1098MB ( +322MB)	MIR_borrow_checking
time:   0.955; rss: 1098MB -> 1070MB (  -29MB)	MIR_effect_checking
time:   0.070; rss: 1070MB -> 1070MB (   +0MB)	module_lints
time:   0.070; rss: 1070MB -> 1070MB (   +0MB)	lint_checking
time:   0.197; rss: 1070MB -> 1070MB (   +0MB)	privacy_checking_modules
time:   0.331; rss: 1070MB -> 1070MB (   +0MB)	misc_checking_3
time:   0.005; rss: 1144MB -> 1145MB (   +1MB)	monomorphization_collector_graph_walk
time:   0.001; rss: 1145MB -> 1145MB (   +0MB)	partition_and_assert_distinct_symbols
time:   0.507; rss: 1070MB -> 1071MB (   +1MB)	generate_crate_metadata
time:   0.016; rss: 1071MB -> 1091MB (  +20MB)	codegen_to_LLVM_IR
time:   0.023; rss: 1083MB -> 1090MB (   +7MB)	LLVM_passes
time:   0.042; rss: 1071MB -> 1090MB (  +19MB)	codegen_crate
time:   0.001; rss: 1089MB -> 1083MB (   -6MB)	incr_comp_persist_dep_graph
time:   0.139; rss: 1080MB -> 1079MB (   -1MB)	encode_query_results
time:   0.147; rss: 1081MB -> 1079MB (   -2MB)	incr_comp_serialize_result_cache
time:   0.147; rss: 1083MB -> 1079MB (   -3MB)	incr_comp_persist_result_cache
time:   0.148; rss: 1090MB -> 1079MB (  -11MB)	serialize_dep_graph
time:   0.104; rss: 1079MB ->  646MB ( -434MB)	free_global_ctxt
time:   0.002; rss:  646MB ->  646MB (   +0MB)	finish_ongoing_codegen
time:   0.122; rss:  646MB ->  681MB (  +35MB)	link_rlib
time:   0.127; rss:  646MB ->  681MB (  +35MB)	link_binary
time:   0.131; rss:  646MB ->  646MB (   +0MB)	link_crate
time:   0.134; rss:  646MB ->  646MB (   +0MB)	link
time:  58.742; rss:   33MB ->  189MB ( +156MB)	total
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 58.88s
	Command being timed: "cargo +stable build -j1 --all-features"
	User time (seconds): 44.88
	System time (seconds): 13.91
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:58.90
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 14853340
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 201
	Minor (reclaiming a frame) page faults: 3835408
	Voluntary context switches: 519
	Involuntary context switches: 1050
	Swaps: 0
	File system inputs: 41128
	File system outputs: 211744
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Beta (1.84)

[17:56:13] मanishearth@manishearth-glaptop2 ~/dev/Git/icu4x_compile_sample ^_^ 
$ cargo clean; RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Ztime-passes" /usr/bin/time -v cargo +beta build -j1 --all-features
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
     Removed 18 files, 136.5MiB total
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
   Compiling demo2 v0.1.0 (/home/manishearth/dev/Git/icu4x_compile_sample/demo)
time:   0.003; rss:   47MB ->   48MB (   +1MB)	parse_crate
time:   0.001; rss:   49MB ->   49MB (   +0MB)	incr_comp_prepare_session_directory
time:   0.003; rss:   49MB ->   50MB (   +1MB)	setup_global_ctxt
time:   0.001; rss:   51MB ->   51MB (   +0MB)	crate_injection
time:   1.243; rss:   51MB ->  601MB ( +550MB)	expand_crate
time:   1.244; rss:   51MB ->  601MB ( +551MB)	macro_expand_crate
time:   0.013; rss:  601MB ->  602MB (   +0MB)	AST_validation
time:   0.001; rss:  602MB ->  602MB (   +0MB)	finalize_imports
time:   0.000; rss:  602MB ->  602MB (   +0MB)	compute_effective_visibilities
time:   0.007; rss:  602MB ->  603MB (   +2MB)	finalize_macro_resolutions
time:   0.278; rss:  603MB ->  650MB (  +47MB)	late_resolve_crate
time:   0.012; rss:  650MB ->  650MB (   +0MB)	resolve_check_unused
time:   0.027; rss:  650MB ->  650MB (   +0MB)	resolve_postprocess
time:   0.325; rss:  602MB ->  650MB (  +49MB)	resolve_crate
time:   0.015; rss:  621MB ->  621MB (   +0MB)	write_dep_info
time:   0.012; rss:  621MB ->  621MB (   +0MB)	complete_gated_feature_checking
time:   0.064; rss:  734MB ->  672MB (  -61MB)	drop_ast
time:   1.231; rss:  621MB ->  637MB (  +16MB)	looking_for_derive_registrar
time:   0.000; rss:  637MB ->  637MB (   +0MB)	unused_lib_feature_checking
time:   1.424; rss:  621MB ->  637MB (  +16MB)	misc_checking_1
time:   0.094; rss:  637MB ->  643MB (   +5MB)	coherence_checking
time:   3.831; rss:  637MB ->  815MB ( +178MB)	type_check_crate
time:  47.344; rss:  815MB -> 1089MB ( +274MB)	MIR_borrow_checking
time:   0.952; rss: 1089MB -> 1066MB (  -22MB)	MIR_effect_checking
time:   0.130; rss: 1066MB -> 1067MB (   +0MB)	module_lints
time:   0.130; rss: 1066MB -> 1067MB (   +0MB)	lint_checking
time:   0.204; rss: 1067MB -> 1067MB (   +0MB)	privacy_checking_modules
time:   0.394; rss: 1066MB -> 1067MB (   +0MB)	misc_checking_3
time:   0.002; rss: 1140MB -> 1140MB (   +0MB)	monomorphization_collector_root_collections
time:   0.004; rss: 1140MB -> 1141MB (   +1MB)	monomorphization_collector_graph_walk
time:   0.000; rss: 1141MB -> 1141MB (   +0MB)	partition_and_assert_distinct_symbols
time:   0.497; rss: 1067MB -> 1053MB (  -13MB)	generate_crate_metadata
time:   0.018; rss: 1067MB -> 1076MB (   +9MB)	LLVM_passes
time:   0.012; rss: 1053MB -> 1076MB (  +23MB)	codegen_to_LLVM_IR
time:   0.034; rss: 1053MB -> 1076MB (  +23MB)	codegen_crate
time:   0.001; rss: 1076MB -> 1076MB (   +0MB)	incr_comp_persist_dep_graph
time:   0.159; rss: 1070MB -> 1069MB (   -1MB)	encode_query_results
time:   0.169; rss: 1072MB -> 1069MB (   -3MB)	incr_comp_serialize_result_cache
time:   0.169; rss: 1075MB -> 1069MB (   -6MB)	incr_comp_persist_result_cache
time:   0.170; rss: 1076MB -> 1069MB (   -7MB)	serialize_dep_graph
time:   0.085; rss: 1069MB ->  651MB ( -418MB)	free_global_ctxt
time:   0.089; rss:  651MB ->  686MB (  +35MB)	link_rlib
time:   0.095; rss:  651MB ->  686MB (  +35MB)	link_binary
time:   0.097; rss:  651MB ->  652MB (   +0MB)	link_crate
time:   0.099; rss:  651MB ->  652MB (   +0MB)	link
time:  56.551; rss:   31MB ->  189MB ( +158MB)	total
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 56.75s
	Command being timed: "cargo +beta build -j1 --all-features"
	User time (seconds): 44.59
	System time (seconds): 11.99
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:56.86
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 14866168
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 770
	Minor (reclaiming a frame) page faults: 3674975
	Voluntary context switches: 1129
	Involuntary context switches: 1047
	Swaps: 0
	File system inputs: 300440
	File system outputs: 211624
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

lqd · 2024-12-18T06:38:45Z

It looks unrelated then, and using a mixed bitset is enough to fix the issue for me.

I haven’t looked into it either. It was a possibility because of the 200K memory allocations for dataflow in const qualif: iirc on acyclic cfgs, some analyses can be run in a single pass on RPO (again I haven’t checked that this case is acyclic or that this analysis could do this optimization )

…-errors Use `MixedBitSet`s in const qualif These analyses' domains should be very homogeneous, having compressed bitmaps on huge cfgs should make a difference (and doesn’t have an impact on the smaller / regular cfgs in our benchmarks). This is a >40% walltime reduction on [this stress test](https://github.com/Manishearth/icu4x_compile_sample) extracted from a real world ICU case, and a 10x or so max-rss reduction. cc `@oli-obk` `@RalfJung` Should help with (or fix) issue rust-lang#134404.

Manishearth added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 17, 2024

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 17, 2024

Manishearth mentioned this issue Dec 17, 2024

Baked data is big, and compiles slowly, for finely sliced data markers unicode-org/icu4x#5230

Open

jieyouxu added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 17, 2024

jieyouxu added the I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. label Dec 17, 2024

lqd mentioned this issue Dec 17, 2024

Use MixedBitSets in const qualif #134438

Merged

saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large files containing many tokens of `const` data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Large files containing many tokens of `const` data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Manishearth commented Dec 17, 2024 •

edited

Loading

oli-obk commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024 •

edited

Loading

oli-obk commented Dec 17, 2024

Manishearth commented Dec 17, 2024

oli-obk commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024

oli-obk commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024 •

edited

Loading

RalfJung commented Dec 17, 2024

Manishearth commented Dec 17, 2024

lqd commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024

Manishearth commented Dec 17, 2024

nnethercote commented Dec 18, 2024

Manishearth commented Dec 18, 2024

Manishearth commented Dec 18, 2024

Manishearth commented Dec 18, 2024 •

edited

Loading

lqd commented Dec 18, 2024

Large files containing many tokens of const data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Large files containing many tokens of const data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Comments

Manishearth commented Dec 17, 2024 • edited Loading

oli-obk commented Dec 17, 2024 • edited Loading

lqd commented Dec 17, 2024 • edited Loading

lqd commented Dec 17, 2024 • edited Loading

oli-obk commented Dec 17, 2024

Manishearth commented Dec 17, 2024

oli-obk commented Dec 17, 2024 • edited Loading

lqd commented Dec 17, 2024

oli-obk commented Dec 17, 2024 • edited Loading

lqd commented Dec 17, 2024 • edited Loading

RalfJung commented Dec 17, 2024

Manishearth commented Dec 17, 2024

lqd commented Dec 17, 2024 • edited Loading

lqd commented Dec 17, 2024

Manishearth commented Dec 17, 2024

nnethercote commented Dec 18, 2024

Manishearth commented Dec 18, 2024

Manishearth commented Dec 18, 2024

Manishearth commented Dec 18, 2024 • edited Loading

lqd commented Dec 18, 2024

Large files containing many tokens of `const` data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Large files containing many tokens of `const` data compile very slowly and use a lot of memory (in MIR_borrow_checking and expand_crate) #134404

Manishearth commented Dec 17, 2024 •

edited

Loading

oli-obk commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024 •

edited

Loading

oli-obk commented Dec 17, 2024 •

edited

Loading

oli-obk commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024 •

edited

Loading

lqd commented Dec 17, 2024 •

edited

Loading

Manishearth commented Dec 18, 2024 •

edited

Loading