-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove some redundant checks from BufReader #98748
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
(rust-highfive has picked a reviewer for you, use r? to override) |
@rustbot author |
45e3cac
to
0cff23d
Compare
@rustbot ready |
The implementation of BufReader contains a lot of redundant checks. While any one of these checks is not particularly expensive to execute, especially when taken together they dramatically inhibit LLVM's ability to make subsequent optimizations.
ee0e126
to
5e5ce43
Compare
@bors try @rust-timer queue r=me, though would be great to update PR description as well. |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 5fa1926 with merge 6973e88d0ee38239c9a7dd0268fc3a0e2e2f4bf8... |
Updated. Let me know if you think there's any other adjustment that should be made. |
Looks great! I want to wait for the perf run to come back -- though I expect it to be mostly neutral, the compiler isn't really I/O heavy for most things, hopefully :) |
☀️ Try build successful - checks-actions |
Queued 6973e88d0ee38239c9a7dd0268fc3a0e2e2f4bf8 with parent 4d6d601, future comparison URL. |
Finished benchmarking commit (6973e88d0ee38239c9a7dd0268fc3a0e2e2f4bf8): comparison url. Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
@bors r+ Looks like mostly noise. |
☀️ Test successful - checks-actions |
Finished benchmarking commit (50166d5): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesThis benchmark run did not return any relevant results for this metric. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |
…k-Simulacrum Avoid repeated re-initialization of the BufReader buffer Fixes rust-lang#102727 We accidentally removed this in rust-lang#98748. It looks so redundant. But it isn't. The default `Read::read_buf` will defensively initialize the whole buffer, if any of it is indicated to be uninitialized. In uses where reads from the wrapped `Read` impl completely fill the `BufReader`, `initialized` and `filled` are the same, and this extra member isn't required. But in the reported issue, the `BufReader` wraps a `Read` impl which will _never_ fill the whole buffer. So the default `Read::read_buf` implementation repeatedly re-initializes the extra space in the buffer. This adds back the extra `initialized` member, which ensures that the default `Read::read_buf` only zero-initialized the buffer once, and I've tried to add a comment which explains this whole situation.
Avoid repeated re-initialization of the BufReader buffer Fixes rust-lang/rust#102727 We accidentally removed this in rust-lang/rust#98748. It looks so redundant. But it isn't. The default `Read::read_buf` will defensively initialize the whole buffer, if any of it is indicated to be uninitialized. In uses where reads from the wrapped `Read` impl completely fill the `BufReader`, `initialized` and `filled` are the same, and this extra member isn't required. But in the reported issue, the `BufReader` wraps a `Read` impl which will _never_ fill the whole buffer. So the default `Read::read_buf` implementation repeatedly re-initializes the extra space in the buffer. This adds back the extra `initialized` member, which ensures that the default `Read::read_buf` only zero-initialized the buffer once, and I've tried to add a comment which explains this whole situation.
The implementation of BufReader contains a lot of redundant checks. While any one of these checks is not particularly expensive to execute, especially when taken together they dramatically inhibit LLVM's ability to make subsequent optimizations by confusing data flow increasing the code size of anything that uses BufReader.
In particular, these changes have a ~2x increase on the benchmark that this adds a
black_box
to. I'm adding thatblack_box
here just in case LLVM gets clever enough to remove the reads entirely. Right now it can't, but these optimizations are really setting it up to do so.We get this optimization by factoring all the actual buffer management and bounds-checking logic into a new module inside
bufreader
with a newBuffer
type. This makes it much easier to ensure that we have correctly encapsulated the management of the region of the buffer that we have read bytes into, and it lets us provide a new faster way to do small reads.Buffer::consume_with
lets a caller do a read from the buffer with a single bounds check, instead of the double-check that's required to usebuffer
+consume
.Unfortunately I'm not aware of a lot of open-source usage of
BufReader
in perf-critical environments. Some time ago I tweaked this code because I sawBufReader
in a profile at work, and I contributed some benchmarks to thebincode
crate which exerciseBufReader::buffer
. These changes appear to help those benchmarks at little, but all these sorts of benchmarks are kind of fragile so I'm wary of quoting anything specific.