-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate 280-byte memset from ReadDir iterator #103137
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
(rust-highfive has picked a reviewer for you, use r? to override) |
} else { | ||
#[allow(deref_nullptr)] | ||
{ | ||
ptr::addr_of!((*ptr::null::<dirent64>()).$field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr::addr_of!((*ptr::null::<dirent64>()).$field) | |
None::<* const dirent64>.map(|p| ptr::addr_of!((*p).$field)).unwrap() |
This is less scary-looking than the ub-in-dead-code version IMO. Edit: actually nevermind, the map's closure is also UB due to the special situation we are in. When given the choice between obvious on first sight UB-in-dead-code, and UB-in-dead-code where you don't see the UB-iness immediately, I think it's better to prefer former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I also considered:
match $entry_ptr {
entry_ptr => {
if true {
entry_ptr.byte_offset(OFFSET).cast::<_>()
} else {
ptr::addr_of!((*entry_ptr).$field)
}
}
}
Not sure if that seems better to you. It avoids the allow(deref_nullptr)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the UB causing version that used to be there earlier in the history, right? Yeah it might actually be good to have that as it makes it clear that this version was specifically not chosen, and gives the opportunity to put a 1-2 line comment about the UB-iness of the code. Even if it repeats the comment above, it helps bring the point across.
Also FTR, I reconsidered after my edit. None::<* const dirent64>.map(|p| ptr::addr_of!((*p).$field)).unwrap()
isn't actually UB, unless my understanding of UB is wrong. All that you need is the type anyways.
|
||
let entry = dirent64_min { | ||
d_ino: copy.d_ino as u64, | ||
d_ino: *offset_ptr!(entry_ptr, d_ino) as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we could avoid the awkward else branch in offset_ptr! if this used ptr::addr_of!, which I think is fine for these fields that are properly typed (and in-bounds)? Or am I missing something?
(Then offset_ptr can be char_offset_ptr and assume that we're casting to *const c_char
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote it this way because the way you suggest is (shockingly) not allowed.
use std::ptr;
#[repr(C)]
struct Small { x: u16, y: u16 }
#[repr(C)]
struct Big { x: u16, y: u16, z: u16 }
fn main() {
unsafe {
let small = Box::into_raw(Box::new(Small { x: 0, y: 0 }));
let big = small as *const Big;
let y = *ptr::addr_of!((*big).y);
let _ = Box::from_raw(small);
println!("{}", y);
}
}
error: Undefined Behavior: dereferencing pointer failed: alloc1550 has size 4, so pointer to 6 bytes starting at offset 0 is out-of-bounds
--> src/main.rs:13:18
|
13 | let y = *ptr::addr_of!((*big).y);
| ^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: alloc1550 has size 4, so pointer to 6 bytes starting at offset 0 is out-of-bounds
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that Small
/Big
code, all of the following possible behaviors for ptr::addr_of!((*big).y)
are sensible and justifiable, with different tradeoffs:
(a) DGAF, it just adds 2 bytes to the value of big
(b) Requires 2 bytes starting at big
to be in bounds of the same allocation
(c) Requires 4 bytes starting at big
to be in bounds of the same allocation
(d) Requires 6 bytes starting at big
to be in bounds of the same allocation
Currently in Rust, (d) is the only one that is definitively not UB. Personally I think (b) should be the requirement (that's the case in a C &big->y
, for example) but we need to follow (d) for now until whoever makes these decisions relaxes that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Yeah, that makes sense - I believe this is an area where the exact rules aren't yet nailed down and Miri is just being conservative here.
I'll go ahead and r+ this implementation, but I agree that we need a better solution here, since in practice the current requirements are overly strict. (It may also be worth having a macro or function accessors like this in libc since presumably any user of readdir ends up needing this macro).
@bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (7fcf850): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Footnotes |
Eliminate 280-byte memset from ReadDir iterator This guy: https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589 It turns out `libc::dirent64` is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory. Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead. ## History This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750. Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr. <table><tr><td>copy 280 bytes</td></tr></table> This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.) The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table> After my PR we just grab the 9 needed bytes directly from entry_ptr. <table><tr><td>copy 9 bytes</td></tr></table> The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better. ## References - https://man7.org/linux/man-pages/man3/readdir.3.html
This guy:
rust/library/std/src/sys/unix/fs.rs
Line 589 in 1536ab1
It turns out
libc::dirent64
is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In #103135 this memset accounted for 0.9% of the runtime of iterating a big directory.Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (
d_ino
andd_type
) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead.History
This code got added in #93459 and tweaked in #94272 and #94750.
Prior to #93459, there was no memset but a full 280 bytes were being copied from the entry_ptr.
This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it.
However this was still buggy because it used
addr_of!((*entry_ptr).d_name)
, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.)The UB got fixed by #94272 by replacing
addr_of
with some pointer manipulation based onoffset_from
, but still fundamentally the same operation.Then #94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value.
After my PR we just grab the 9 needed bytes directly from entry_ptr.
The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just
entry_ptr->d_name
. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better.References