-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bounds checked get_unchecked, use it everywhere. #231
Conversation
Love the idea! Brillant :) thank you! |
Hmm... So i fixed the main index out of bounds which brings the number of failed tests down to 27, and the remaining ones seem to be mostly/entirely Which seem to be doing index out of bounds that can pretty clearly fail, and do. They seem to have the ability to signal error conditions, but also is the input slice to them meant to be larger (to ensure that you can index a few bytes past the value)? I'm not too familiar with the design of this library, and I'm assuming this is just C-isms ported to rust where a slice does mean you can't index past the bounds of the slice. |
Heya, yes you've the right hunch there, the thought is that as long as we don't read beyond a page border we can still do a read of a full simd register even if it is beyond the slice we have. The original library has logic to relocate the input in cases where it's too close to a page border. For simplicities sake we just always relocate to a buffer at least two registers larger then the input string (there is probably an option for optimistation there to only relocate when needed). To ensure borrowing is correct, we still borrow strings out of the original buffer, however. But checks like the true/false atom should be handled on the resized buffer. One thing to check the sanity for this I could see is in the case of debug runs fill the rest of the input buffer with |
Yeah, that's not true, you can only read within the bounds of a slice, even if you use raw pointers derived from that slice. Reading uninit bytes is UB in rust, even if you don't "use" them (The more precise definition is a move of some bytes at a type for which they're invalid, and a I tried just doing a set_len to the capacity after zero filling the rest of the AlignedBuf, but that seemed to cause a lot more tests to fail. Will keep looking. |
Okay, I think I found the issue, |
And also fixed some misaligned reads. I think this PR is good to be reviewed, it's not all the changes I think this crate should get, but it's a good point to stop for now. |
let to_fill = input_buffer.capacity() - len; | ||
std::ptr::write_bytes(input_buffer.as_mut_ptr().add(len), 0, to_fill); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to guard this behind the same if cfg!(debug_assertions)
that is in get_kinda_unchecked
? It doesn't really seem like it's needed during runtime, just during tests to validate the get
s. OTOH, it's probably worth benchmarking if it makes a measurable difference and is worth guarding or fine to just leave in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because if you just write one byte, you get a use-of-uninitialized-value from memory sanitizer, and I'm sure we're reading uninit bytes into a u64
(say, read_true_atom run on [t r u e \x00 <uninit> <uninit> <uninit>]
Simply reading uninit bytes into a u64
is UB (see: rust-lang/unsafe-code-guidelines#71), and memory sanitizer complaining means there's definitely UB here (memory sanitizer is a very conservative check, only checking for branching on uninit bytes).
Upstream seems to... just ignore the problem (they run memory sanitizer, but ignore this).
We presumably only need to ensure that input.len() + SIMDJSON_PADDING
bytes are zero filled, so the user could re-use buffers manually (which would remove both the allocation and the zeroing). As in, the write_bytes could only be done if we're using a buffer that isn't long enough (in len
, not capacity
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They ignore it because it's not a real problem, it's a "UB by decree" not because any behavior is actually undefined, the discussion you linked says this quite clearly.
There are no possible invalid combinations of bits in a u64. Rust has the tendency to be overly careful with safety, which is great, but sometimes it's worth understanding what happens under the hood - that's why unsafe exists, after all, the compiler can't always make the right call as it misses context, like that' we're not reading a u64 but a SIMD register.
But it's not worth arguing over this, criterion says the change has no significant impact on performance, some tests are marginally faster others marginally slower - it's with run on run varriance. So there is no reason to keep potentially UB behaviour just for the sake of it, that'd be silly, if we can be compiler-agreed-safe and fast that's definitely the route to take :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's not worth arguing over this
I'm not commenting here to argue with you, but to note that you should be significantly more careful with uninitialized integers since rust-lang/rust#106294
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya,
this looks really great! Thank you :)! I'll try to find some time to run a benchmark with / without the fill and see if it makes sense to even worry about guarding it.
This is HEAVY wip because these bounds check panic in
every single test95 tests. Indexing out of bounds is UB, which means this crate has no test that isn't currently UB :)There's standard library checks for this too (if you run with
-Zbuild-std
), but those will just SIGILL your program, and you have to use gdb to get a backtrace.So I copied this from where I added this in https://github.com/iximeow/yaxpeax-x86/blob/no-gods-no-/src/safer_unchecked.rs.
I'll leave this as draft while I investigate what's up and then mark it as ready once I get stuck on some of these.