-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in 1.64+ when BufReader
inner reader doesn't fill the buffer
#102727
Comments
Thanks for reporting this! This is an issue with a debug build so it's unlikely to be prioritized high but I'm interested in helping out because this is very goofy. The slow code here is the Iterator impl on slices. I think in the last stable release we switched the implementation of BufReader to eventually use ReadBuf, and the slow code is ReadBuf initializing the remaining uninitialized capacity. It does this by iterating over a slice, and that iteration is the slow bit. What I don't get here is why we need to initialize at all. That doesn't seem right. |
WG-prioritization assigning highest priority since this looks a pretty visible runtime perf. regression (Zulip discussion). If the assessment should be revised, please feel free to comment. @rustbot label -I-prioritize +P-critical i-slow |
I've tried bisecting with Regression seems to be in 50166d5 and confirm #98748 bisected with cargo-bisect-rustc v0.6.0Host triple: x86_64-unknown-linux-gnu cargo bisect-rustc ./script.sh --start 2022-07-01 --preserve |
😂 I was so sure this wasn't my fault. More fool me for not bisecting it myself. |
(This probably shows an opportunity to add a benchmark to the newly-established "runtime benchmarks" (rather than the compilation-time benchmarks that perf.rlo has classically focused on...) |
After sleeping on this, I realized that there's no real reason to expect the worlds-worst-memset code which is the source of the slowdown to be totally optimized out with use std::cmp::min;
use std::io;
const TOTAL_BYTES: usize = 16 * 1024 * 1024;
// runtime increases as the difference between these values increases
const BYTES_PER_READ: usize = 8 * 1024;
const BUFREADER_CAPACITY: usize = 256 * 1024;
fn main() {
for _ in 0..1_000 {
let mut reader = io::BufReader::with_capacity(BUFREADER_CAPACITY, Reader(TOTAL_BYTES));
let count = io::copy(&mut reader, &mut io::sink()).unwrap();
assert_eq!(count, 16777216)
}
}
struct Reader(usize);
impl io::Read for Reader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
let len = min(min(buf.len(), self.0), BYTES_PER_READ);
if len > 0 {
buf[..len].copy_from_slice(&vec![0; len]);
self.0 -= len;
}
Ok(len)
}
} On 1.63 I see runtime ~0.43 s, and on 1.64 ~4.8 s. So we have a slowdown of around 10x, though I suspect that number could be ratcheted up higher. |
@rustbot claim |
Awesome, thanks for the really quick fix! |
Use memset to initialize readbuf The write loop was found to be slow in rust-lang#102727 The proper fix is in rust-lang#102760 but this might still help debug builds and code running under miri by using the write_bytes intrinsic instead of writing one byte at a time.
Avoid repeated re-initialization of the BufReader buffer Fixes rust-lang/rust#102727 We accidentally removed this in rust-lang/rust#98748. It looks so redundant. But it isn't. The default `Read::read_buf` will defensively initialize the whole buffer, if any of it is indicated to be uninitialized. In uses where reads from the wrapped `Read` impl completely fill the `BufReader`, `initialized` and `filled` are the same, and this extra member isn't required. But in the reported issue, the `BufReader` wraps a `Read` impl which will _never_ fill the whole buffer. So the default `Read::read_buf` implementation repeatedly re-initializes the extra space in the buffer. This adds back the extra `initialized` member, which ensures that the default `Read::read_buf` only zero-initialized the buffer once, and I've tried to add a comment which explains this whole situation.
Code
I tried this code:
I expected to see this happen:
Instead, this happened:
The problem doesn't occur with optimization enabled.
Version it worked on
It most recently worked on:
Version with regression
Also fails on:
@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged
The text was updated successfully, but these errors were encountered: