-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MixedBitSet
s in const qualif
#134438
Use MixedBitSet
s in const qualif
#134438
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Try using compressible bitsets in const qualif These analyses domains should be very homogeneous, having compressed bitmaps on huge cfgs should make a difference (and hopefully have little impact on the smaller ones). r? ghost
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
r=me if it ends up working out well |
It is working well on manish's real world stress test, hopefully benchmarks are not negatively impacted. I think they shouldn't, but we'll see ^^. |
Finished benchmarking commit (8280a60): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResults (primary 1.5%, secondary 2.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 770.505s -> 772.124s (0.21%) |
MixedBitSet
s in const qualif
Some changes occurred to the CTFE machinery cc @rust-lang/wg-const-eval |
r? compiler-errors @bors r=compiler-errors |
Given that this perf diff isn't reflected by our benchmarks at all, it seems likely to regress again in the future. I'd recommend adding the stress test to the benchmark suite. |
@@ -248,10 +248,10 @@ where | |||
#[derive(Debug, PartialEq, Eq)] | |||
pub(super) struct State { | |||
/// Describes whether a local contains qualif. | |||
pub qualif: BitSet<Local>, | |||
pub qualif: MixedBitSet<Local>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth a comment explaining the choice of representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. In general using a BitSet
on large domain sizes can create issues like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way a contributor could know this. BitSet
is the most canonically named type so it's the one developers will reach for by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree. Thinking about the domain representation and whether it's dense or sparse is super common in dataflow so "no way to know this" is probably too broad of a brush stroke though.
We could add a comment to BitSet
, or the bit_set
module, describing and linking to the other bitsets, cc @nnethercote on that.
In this case I've added the requested comment to the analysis state, if you're happy with it, feel free to r you and michael.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worth renaming BitSet to DenseBitSet perhaps? That way there's not an obvious one to reach for.
But that should be follow-up for sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add it to my todo list :)
Sorry I don't really have the time to do that, but also I'm not sure this stress test is that important to track over time? It was easy to analyze and fix, and probably no one else has seen that issue. 250K locals on a 750K-node CFG is bound to cause slowness elsewhere (and is still slow with this PR) so it's not uninteresting as a benchmark. I think when people encounter such issues of slowness triggered by the function size (and we have some of these in our stress tests), they just split their code into more functions or constants. |
It doesn't have to be you adding the benchmark, maybe @Manishearth can look into this. :) But if we don't yet have a benchmark covering such extreme MIR bodies, it seems like we should? (Might also be interesting to have this as a @rust-lang/wg-compiler-performance probably know best whether it'd be a good idea to add such a benchmark or not. |
f2a1b09
to
db7d6a9
Compare
I am in wg-compiler-performance 😅 . We likely don't have a benchmark one for such consts per se, but I believe we have others with big functions. |
Comment looks good, thanks. :)
So maybe we just need to make one of those existing big functions a |
Maybe yeah, but we'd need to profile to make sure. To trigger the most const checks, some of these may need to be called in const contexts as well, I assume. In that case, maybe a dedicated benchmark could be better suited (esp since some of the big functions in rustc-perf may actually be closures) if we wanted to add one. Though now that I think about it we have similar-sounding consts and arrays in real world benchmarks from unicode crates. I don't know if they have a similar profile as this stress test, but on smaller cfgs, and it'd probably be the first thing to check. (It's also possible that some other real world crates will exercise these code paths more, and we'll update the benchmarks in a few months, after the edition.) |
To trigger the most const checks, some of these may need to be called in const contexts as well, I assume.
We already have ctfe-stress to benchmark *running* const code. This here is about the static analysis determining whether the code is legal to place in const context. (It's basically a type system, or more accurately a very special-case effect system.) So no, we don't have to actually call them.
|
@RalfJung if you point me to where the benchmark should go I can contribute it eventually. One note is that that generated code uses Unicode-3 licensed data, I could replace it with some random nonsense instead if needed. Also do let me know if you'd prefer the benchmark to use macros or not. I can simplify further so it benchmarks the things you care about. The benchmark uses a macro because that's what we do (because our data crates are pure macro crates and they get invoked by our code crates). |
We are due to do a rustc-perf benchmark upgrade in early 2025, it would make sense to add any new const benchmark as part of that. I have started an update doc here, and linked to this PR as one to possibly add. |
☀️ Test successful - checks-actions |
Finished benchmarking commit (5dfe648): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResults (primary 1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 768.017s -> 769.305s (0.17%) |
Rename `BitSet` to `DenseBitSet` r? `@Mark-Simulacrum` as you requested this in rust-lang#134438 (comment) after such a confusion. This PR renames `BitSet` to `DenseBitSet` to make it less obvious as the go-to solution for bitmap needs, as well as make its representation (and positives/negatives) clearer. It also expands the comments there to hopefully make it clearer when it's not a good fit, with some alternative bitsets types. (This migrates the subtrees cg_gcc and clippy to use the new name in separate commits, for easier review by their respective owners, but they can obvs be squashed)
Rollup merge of rust-lang#135205 - lqd:bitsets, r=Mark-Simulacrum Rename `BitSet` to `DenseBitSet` r? `@Mark-Simulacrum` as you requested this in rust-lang#134438 (comment) after such a confusion. This PR renames `BitSet` to `DenseBitSet` to make it less obvious as the go-to solution for bitmap needs, as well as make its representation (and positives/negatives) clearer. It also expands the comments there to hopefully make it clearer when it's not a good fit, with some alternative bitsets types. (This migrates the subtrees cg_gcc and clippy to use the new name in separate commits, for easier review by their respective owners, but they can obvs be squashed)
These analyses' domains should be very homogeneous, having compressed bitmaps on huge cfgs should make a difference (and doesn’t have an impact on the smaller / regular cfgs in our benchmarks).
This is a >40% walltime reduction on this stress test extracted from a real world ICU case, and a 10x or so max-rss reduction.
cc @oli-obk @RalfJung
Should help with (or fix) issue #134404.