-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up NLL with HybridIdxSetBuf. #53383
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
Instruction count results for NLL-check builds:
I don't know why Some notable max-rss results:
I'd love to hear suggestions on how to improve the code under the |
@bors try |
⌛ Trying commit 00e7f624f47a2b5e196a24d6fb32779c856535a7 with merge 496278c16018428a506261e2ed5396168b5c17b9... |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☀️ Test successful - status-travis |
HybridIdxSetBuf::Sparse(sparse, _) => self.subtract_sparse(sparse), | ||
HybridIdxSetBuf::Dense(dense, _) => self.subtract(dense), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it's a good idea to have a trait instead of different methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the trait look like? I'm having trouble imagining it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A trait would read nicer, for sure, but we could leave it to follow-up. I imagine it would be something like:
trait SubtractFrom<T> {
fn subtract_from(&self, target: &mut IdxSetBuf<T>) -> bool;
}
and then you would have
impl<T> IdxSetBuf<T> {
fn subtract(&mut self, other: &impl SubtractFrom<T>) -> bool {
other.subtract_from(self)
}
}
and then
impl<T> SubtractFrom<T> for IdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for HybridIdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for SparseIdxSetBuf<T> { .. }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave as-is for now and maybe clean it up a little later. I'm not actually sure that the trait would be a win for readability anyway.
} | ||
} | ||
|
||
pub struct SparseIter<'a, T: Idx> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably instead use impl Trait
and pass through the various iterator traits; DoubleEnded
, TrustedLen
, etc. Otherwise we're losing performance here...
We should at least be able to implement size_hint, count, and nth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point at some existing code that does similar things? I'm not sure what this would look like. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, FWIW, I just copied the code in the existing Iter
for IdxSetBuf
. So if there is a genuine win to be had here -- and profiling doesn't indicate so, at least for the moment -- it should be changed as well. Perhaps that could also be a follow-up.
mem::replace(self, HybridIdxSetBuf::Dense(dense, universe_size)); | ||
changed | ||
} | ||
_ => panic!("impossible"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an unreachable! I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but ideally someone would point out a way to rewrite the code so this isn't necessary :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let HybridIdxSetBuf::Sparse(sparse, universe_size) = mem::replace(self, dummy)
? It doesn't require an else
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I find that misleading -- it's strange to have an if let
that cannot fail. The match
makes the cannot-fail-ness more obvious.
} | ||
|
||
/// Removes `elem` from the set `self`. | ||
pub fn remove(&mut self, elem: &T) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to take by reference here since I'd is Copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied the existing method in IdxSetBuf
, so I will leave this alone. Converting all such methods in this file could be done as a follow-up.
@rust-timer build 496278c16018428a506261e2ed5396168b5c17b9 |
Success: Queued 496278c16018428a506261e2ed5396168b5c17b9 with parent 81cfaad, comparison URL. |
@rust-timer build 496278c16018428a506261e2ed5396168b5c17b9 |
Success: Queued 496278c16018428a506261e2ed5396168b5c17b9 with parent 81cfaad, comparison URL. |
1 similar comment
Success: Queued 496278c16018428a506261e2ed5396168b5c17b9 with parent 81cfaad, comparison URL. |
Here is the full Cachegrind diff showing the difference between the two versions:
That is an amazingly small diff. The slowdown is entirely due to more hash table operations underneath @RalfJung, do you have any idea what might be going on? |
Perf results are in, and they're even better than what I saw locally. Notable instruction count results (look at the
Things to note:
Notable max-rss results:
These are similar to my local measurements. There are also these max-rss results:
These are not |
The specific ICE is this:
That doesn't mean much to me, alas. |
https://github.com/rust-lang/rust/blob/master/src/libcore/iter/mod.rs#L513 is a sample of the forwarding needed. I wish there was a better way, but I'm unaware of it. |
@nnethercote I can take a look at the ICE, though prob not for a day or two |
@nnethercote also, those perf stats look great! |
I worked out the ICE by incrementally reverting parts of the patch. The problem is in |
`HybridIdxSetBuf` is a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large `universe_size` values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces the execution time of the five slowest NLL benchmarks by 55%, 21%, 16%, 10% and 9%. It also reduces the max-rss of three benchmarks by 53%, 33%, and 9%.
00e7f62
to
5745597
Compare
New version fixes the test failures. I also did another perf run locally, and |
@bors try |
⌛ Trying commit 5745597 with merge 10d5f0b359098fbbe68196b82c89cb4d02c93ce0... |
☀️ Test successful - status-travis |
⌛ Testing commit 5745597 with merge 8c91bddf4a977f33f8f6490fca480f9df68e9a33... |
💔 Test failed - status-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors retry |
⌛ Testing commit 5745597 with merge d81755a0327f37d33dd7d01251956a818b12767a... |
💔 Test failed - status-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors retry |
Speed up NLL with HybridIdxSetBuf. It's a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large universe_size values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces `tuple-stress`'s NLL-check time by 40%, and up to 12% for several other benchmarks. And it halves the max-rss for `keccak`, and has smaller wins for `inflate` and `clap-rs`.
☀️ Test successful - status-appveyor, status-travis |
The top 5 on the NLL dashboard now looks like this:
And |
@Mark-Simulacrum: hmm, any idea why ripgrep is showing up above sentry-cli even though its ratio is lower? A few of the others lower down the list are out of order too. |
No, that shouldn't be happening. Could you file an issue? |
…Simulacrum Remove `HybridBitSet` `HybridBitSet` was introduced under the name `HybridIdxSetBuf` way back in rust-lang#53383 where it was a big win for NLL borrow checker performance. In rust-lang#93984 the more flexible `ChunkedBitSet` was added. Uses of `HybridBitSet` have gradually disappeared (e.g. rust-lang#116152) and there are now few enough that they can be replaced with `BitSet` or `ChunkedBitSet`, and `HybridBitSet` can be removed, cutting more than 700 lines of code. r? `@Mark-Simulacrum`
It's a sparse-when-small but dense-when-large index set that is very
efficient for sets that (a) have few elements, (b) have large
universe_size values, and (c) are cleared frequently. Which makes it
perfect for the
gen_set
andkill_set
sets used by the new borrowchecker.
This patch reduces
tuple-stress
's NLL-check time by 40%, and up to 12%for several other benchmarks. And it halves the max-rss for
keccak
,and has smaller wins for
inflate
andclap-rs
.