-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add writeable::write_or_ref and use it in icu_locale_core #5738
Conversation
Our utility functions are nicely organized in the docs: |
utils/writeable/src/write_or_ref.rs
Outdated
/// Cow::Borrowed("Hello, Alice!") | ||
/// )); | ||
/// | ||
/// // Junk at the end is ignored: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// // Junk at the end is ignored: | |
/// // Borrowing can use a prefix: |
utils/writeable/src/write_or_ref.rs
Outdated
/// Cow::Owned(s) if s == "this is uppercase" | ||
/// )); | ||
/// ``` | ||
pub fn write_or_ref<'a>(writeable: &impl Writeable, reference_bytes: &'a [u8]) -> Cow<'a, str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not happy with this name. Usually methods called write
write to one of the arguments, and don't do any allocation themselves, like this does. I think write_to_string_or_borrow
would be a more accurate name, and would align more with write_to_string
, which is meant to use it.
utils/writeable/src/write_or_ref.rs
Outdated
let owned: &mut String = match self { | ||
SliceOrString::Slice(slice) => { | ||
if slice.try_push(other) { | ||
return Ok(()); | ||
} | ||
// We failed to match. Convert to owned, put it in the field, | ||
// and get it out again. | ||
let valid_str = slice.validated_as_str(); | ||
let owned = String::from(valid_str); | ||
*self = SliceOrString::String(owned); | ||
let SliceOrString::String(owned) = self else { | ||
unreachable!() | ||
}; | ||
owned | ||
} | ||
SliceOrString::String(owned) => owned, | ||
}; | ||
owned.write_str(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is jumping through a lot of hoops to not duplicate line 75. I think the code would be a lot easier if you did that:
let owned: &mut String = match self { | |
SliceOrString::Slice(slice) => { | |
if slice.try_push(other) { | |
return Ok(()); | |
} | |
// We failed to match. Convert to owned, put it in the field, | |
// and get it out again. | |
let valid_str = slice.validated_as_str(); | |
let owned = String::from(valid_str); | |
*self = SliceOrString::String(owned); | |
let SliceOrString::String(owned) = self else { | |
unreachable!() | |
}; | |
owned | |
} | |
SliceOrString::String(owned) => owned, | |
}; | |
owned.write_str(other) | |
match self { | |
SliceOrString::Slice(slice) => { | |
if !slice.try_push(other) { | |
// We failed to match. Convert to owned. | |
let valid_str = slice.validated_as_str(); | |
let owned = format!("{valid_str}{other}); | |
*self = SliceOrString::String(owned); | |
Ok(()) | |
} | |
SliceOrString::String(owned) => owned.write_str(other), | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a lot of hoops: it is a standard "base case" coding style. That said, I think your code is more readable, except I don't like using format!
. I'll push up something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this Rust program:
#[inline(never)]
fn string_from_pieces_format(a: &str, b: &str) -> String {
std::format!("{a}{b}")
}
#[inline(never)]
fn string_from_pieces_manual(a: &str, b: &str) -> String {
let mut s = String::with_capacity(a.len() + b.len());
s.push_str(a);
s.push_str(b);
s
}
fn main() {
let s1 = string_from_pieces_format("hello", "world");
let s2 = string_from_pieces_manual("hello", "world");
println!("{s1} {s2}");
}
Here is the ASM (https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=dff4d0ae823abc03d88fad4b84b56a7d):
std::rt::lang_start:
pushq %rax
movl %ecx, %r8d
movq %rdx, %rcx
movq %rsi, %rdx
movq %rdi, (%rsp)
leaq .L__unnamed_1(%rip), %rsi
movq %rsp, %rdi
callq *std::rt::lang_start_internal@GOTPCREL(%rip)
popq %rcx
retq
std::rt::lang_start::{{closure}}:
pushq %rax
movq (%rdi), %rdi
callq std::sys::backtrace::__rust_begin_short_backtrace
xorl %eax, %eax
popq %rcx
retq
std::sys::backtrace::__rust_begin_short_backtrace:
pushq %rax
callq *%rdi
#APP
#NO_APP
popq %rax
retq
<&T as core::fmt::Display>::fmt:
movq %rsi, %rdx
movq (%rdi), %rax
movq 8(%rdi), %rsi
movq %rax, %rdi
jmpq *<str as core::fmt::Display>::fmt@GOTPCREL(%rip)
core::ops::function::FnOnce::call_once{{vtable.shim}}:
pushq %rax
movq (%rdi), %rdi
callq std::sys::backtrace::__rust_begin_short_backtrace
xorl %eax, %eax
popq %rcx
retq
<alloc::string::String as core::fmt::Display>::fmt:
movq %rsi, %rdx
movq 8(%rdi), %rax
movq 16(%rdi), %rsi
movq %rax, %rdi
jmpq *<str as core::fmt::Display>::fmt@GOTPCREL(%rip)
playground::string_from_pieces_format:
subq $120, %rsp
leaq .L__unnamed_2(%rip), %rax
movq %rax, 8(%rsp)
movq $5, 16(%rsp)
leaq .L__unnamed_3(%rip), %rax
movq %rax, 24(%rsp)
movq $5, 32(%rsp)
leaq 8(%rsp), %rax
movq %rax, 40(%rsp)
leaq <&T as core::fmt::Display>::fmt(%rip), %rax
movq %rax, 48(%rsp)
leaq 24(%rsp), %rcx
movq %rcx, 56(%rsp)
movq %rax, 64(%rsp)
leaq .L__unnamed_4(%rip), %rax
movq %rax, 72(%rsp)
movq $2, 80(%rsp)
movq $0, 104(%rsp)
leaq 40(%rsp), %rax
movq %rax, 88(%rsp)
movq $2, 96(%rsp)
leaq 72(%rsp), %rsi
callq *alloc::fmt::format::format_inner@GOTPCREL(%rip)
addq $120, %rsp
retq
playground::string_from_pieces_manual:
pushq %rbx
movq %rdi, %rbx
movq __rust_no_alloc_shim_is_unstable@GOTPCREL(%rip), %rax
movzbl (%rax), %eax
movl $10, %edi
movl $1, %esi
callq *__rust_alloc@GOTPCREL(%rip)
testq %rax, %rax
je .LBB7_2
movb $111, 4(%rax)
movl $1819043176, (%rax)
movl $1819438967, 5(%rax)
movb $100, 9(%rax)
movq $10, (%rbx)
movq %rax, 8(%rbx)
movq $10, 16(%rbx)
popq %rbx
retq
.LBB7_2:
movl $1, %edi
movl $10, %esi
callq *alloc::raw_vec::handle_error@GOTPCREL(%rip)
playground::main:
pushq %r14
pushq %rbx
subq $136, %rsp
leaq 8(%rsp), %rbx
movq %rbx, %rdi
callq playground::string_from_pieces_format
leaq 32(%rsp), %r14
movq %r14, %rdi
callq playground::string_from_pieces_manual
movq %rbx, 56(%rsp)
leaq <alloc::string::String as core::fmt::Display>::fmt(%rip), %rax
movq %rax, 64(%rsp)
movq %r14, 72(%rsp)
movq %rax, 80(%rsp)
leaq .L__unnamed_5(%rip), %rax
movq %rax, 88(%rsp)
movq $3, 96(%rsp)
movq $0, 120(%rsp)
leaq 56(%rsp), %rax
movq %rax, 104(%rsp)
movq $2, 112(%rsp)
leaq 88(%rsp), %rdi
callq *std::io::stdio::_print@GOTPCREL(%rip)
movq 32(%rsp), %rsi
testq %rsi, %rsi
je .LBB8_4
movq 40(%rsp), %rdi
movl $1, %edx
callq *__rust_dealloc@GOTPCREL(%rip)
.LBB8_4:
movq 8(%rsp), %rsi
testq %rsi, %rsi
je .LBB8_6
movq 16(%rsp), %rdi
movl $1, %edx
callq *__rust_dealloc@GOTPCREL(%rip)
.LBB8_6:
addq $136, %rsp
popq %rbx
popq %r14
retq
movq %rax, %rbx
movq 32(%rsp), %rsi
testq %rsi, %rsi
je .LBB8_8
movq 40(%rsp), %rdi
movl $1, %edx
callq *__rust_dealloc@GOTPCREL(%rip)
jmp .LBB8_8
movq %rax, %rbx
.LBB8_8:
movq 8(%rsp), %rsi
testq %rsi, %rsi
je .LBB8_10
movq 16(%rsp), %rdi
movl $1, %edx
callq *__rust_dealloc@GOTPCREL(%rip)
.LBB8_10:
movq %rbx, %rdi
callq _Unwind_Resume@PLT
main:
pushq %rax
movq %rsi, %rcx
movslq %edi, %rdx
leaq playground::main(%rip), %rax
movq %rax, (%rsp)
leaq .L__unnamed_1(%rip), %rsi
movq %rsp, %rdi
xorl %r8d, %r8d
callq *std::rt::lang_start_internal@GOTPCREL(%rip)
popq %rcx
retq
.L__unnamed_1:
.asciz "\000\000\000\000\000\000\000\000\b\000\000\000\000\000\000\000\b\000\000\000\000\000\000"
.quad core::ops::function::FnOnce::call_once{{vtable.shim}}
.quad std::rt::lang_start::{{closure}}
.quad std::rt::lang_start::{{closure}}
.L__unnamed_4:
.quad 1
.zero 8
.quad 1
.zero 8
.L__unnamed_2:
.ascii "hello"
.L__unnamed_3:
.ascii "world"
.L__unnamed_6:
.byte 32
.L__unnamed_7:
.byte 10
.L__unnamed_5:
.quad 1
.zero 8
.quad .L__unnamed_6
.asciz "\001\000\000\000\000\000\000"
.quad .L__unnamed_7
.asciz "\001\000\000\000\000\000\000"
So it appears avoiding format!
is worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot parse that atm. Just make sure to reserve capacity once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
No strong opinion on naming, weakly agree that write_or_ref isn't ideal, but don't have antyhing better.
Needs more structured safety comments but I tried to help with that.
It's unfortunate this needs unsafe at all, the str
version of this would be purely safe. But we do want Potentially versions.
utils/writeable/src/write_or_ref.rs
Outdated
|
||
impl<'a> PartiallyValidatedUtf8<'a> { | ||
fn new(slice: &'a [u8]) -> Self { | ||
Self { slice, offset: 0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: safety comment:
/// Safety: Field invariants maintained here trivially:
/// 1. The offset 0 is ≤ all possible lengths of slice
/// 2. The slice contains nothing up to the offset zero
We typically want similar safety comments for whenever one of the fields is mutated.
utils/writeable/src/write_or_ref.rs
Outdated
// Safety Invariant: `valid_str` is valid UTF-8, and we are | ||
// incrementing the offset to cover bytes equal to `valid_str` | ||
self.offset += valid_str.len(); | ||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say "Safety: Field invariants maintained here by..." and use a list like above
utils/writeable/src/write_or_ref.rs
Outdated
/// Return the validated portion as `&str`. | ||
fn validated_as_str(&self) -> &'a str { | ||
debug_assert!(self.offset <= self.slice.len()); | ||
// Safety: self.offset is a valid end index in a range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd suffix both of these comments with (from field invariant)
, or somethign similar, basically indicating where the invariant comes from
utils/writeable/src/write_or_ref.rs
Outdated
} | ||
|
||
/// Writes the contents of a `Writeable` to a string, returning a reference | ||
/// to a slice if it matches, and allocating a string otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// to a slice if it matches, and allocating a string otherwise. | |
/// to a slice if it matches the provided reference bytes, and allocating a string otherwise. |
I think I addressed all comments other than the function name. Happy to entertain suggestions. Bikeshed:
A problem with |
I would argue that the |
Yeah, |
🎉 All dependencies have been resolved ! |
I'll wait to merge this until we discuss the name. |
Naming choice: LGTM: @Manishearth @robertbastian (@sffc) |
Follow-up to #5727
Depends on #5737
Replaces #5733
See #2748
I am adding an additional utility function to the
writeable
crate with a similar architecture towriteable::cmp_bytes
. This one,writeable::write_or_ref
, makes certain normalization operations more efficient.