-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define non-panicking UTF encoding methods on char
#52580
Conversation
r? @sfackler (rust_highfive has picked a reviewer for you, use r? to override) |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Since one needs to check the return value for You can write if buf.len() >= c.utf8_len() {
let s = c.encode_utf8(buf);
// work with str
} else {
// alternate action
} instead of if let Some(s) = c.try_encode_utf8(buf) {
// work with str
} else {
// alternate action, but cannot access `buf` here until nll are a thing
} Do you have a real world usage example that can be used as motivation for this change? Your PR message should also reference the corresponding issue that this PR is closing |
I'm also interested in the motivation for these methods. |
In general, i think the My use case is passing an encoding function to another function which fills a buffer and writes it out when full: extern crate io;
use core::mem;
pub use self::io::*;
fn fst2<S, T>((x, _): (S, T)) -> S { x }
pub fn writeCode<Codon: Copy, F: FnMut(T, &mut [Codon]) -> Option<usize>, I: Iterator<Item = T>, T, W: Write<Codon>>
(mut encode: F, w: &mut W, xs : I) -> Result<usize, W::Err> where W::Err: From<EndOfFile> {
let mut buf: [Codon; 4096] = unsafe { mem::uninitialized() };
let mut pos = 0;
let mut nBytesWritten = 0;
for x in xs {
match encode(x, &mut buf[pos..]) {
Some(n) => { pos += n; },
None => {
try!(w.write_all(&buf[0..pos]).map_err(fst2));
nBytesWritten += pos;
pos = 0;
}
}
}
try!(w.write_all(&buf[0..pos]).map_err(fst2));
nBytesWritten += pos;
Ok(nBytesWritten)
} It is much more convenient to take a function (It actually wants |
src/libcore/char/methods.rs
Outdated
#[unstable(feature = "try_unicode_encode_char", issue = "52579")] | ||
#[inline] | ||
pub fn try_encode_utf16(self, dst: &mut [u16]) -> Option<&mut [u16]> { | ||
if dst.len() < self.len_utf16() { None } else { Some(self.encode_utf16(dst)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is essentially doing the checks twice in the happy path, maybe move the unsafe code into this function and have the panicky function match on the result and panic on None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/libcore/char/methods.rs
Outdated
#[unstable(feature = "try_unicode_encode_char", issue = "52579")] | ||
#[inline] | ||
pub fn try_encode_utf8(self, dst: &mut [u8]) -> Option<&mut str> { | ||
if dst.len() < self.len_utf8() { None } else { Some(self.encode_utf8(dst)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for this function
Bikeshed: ought this to return |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
I agree with @oli-obk that the motivation doesn’t feel enough to add new methods. It seems easy enough to check the buffer length against You can even omit the length check entirely and use a 4-bytes buffer: something.write_str(c.encode_utf8(&mut [0_u8; 4])) So unless new arguments come up I’m inclined to not accept this: @rfcbot fcp close |
Team member @SimonSapin has proposed to close this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
@SimonSapin Unfortunately merely checking the length emits all the panicking code, which i'd prefer to avoid. Edit: but maybe this is an optimizer issue? is it reasonable to expect the optimizer to recognize this and elide the panicking case? |
I’d certainly be open to tweaking the internals of |
It's not a huge deal — i can rather define an extension trait (for example) — but extension traits are cumbersome, and it seems unfortunate to reinvent this functionality. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to close, as per the review above, is now complete. |
…oli-obk Clean up unsafety in char::encode_utf8 This originally started as an attempt to allow LLVM to optimize through encode_utf8 to detect the try_encode_utf8 case (rust-lang#52579, rust-lang#52580), but due to a typo my conclusion that my optimizations were successful was incorrect. Furthermore, as far as I can tell, this optimization is probably just not possible with LLVM today. This [code](https://rust.godbolt.org/z/JggRj4) compiles down to a long series of compares, notably, two identical series of compares. That essentially means that LLVM is today unable to see that these two ifs are identical and as such can be merged and then realize that no value of the if condition can result in a call to `please_delete`. As such, for now, we do not attempt to specifically optimize for that case.
…oli-obk Clean up unsafety in char::encode_utf8 This originally started as an attempt to allow LLVM to optimize through encode_utf8 to detect the try_encode_utf8 case (rust-lang#52579, rust-lang#52580), but due to a typo my conclusion that my optimizations were successful was incorrect. Furthermore, as far as I can tell, this optimization is probably just not possible with LLVM today. This [code](https://rust.godbolt.org/z/JggRj4) compiles down to a long series of compares, notably, two identical series of compares. That essentially means that LLVM is today unable to see that these two ifs are identical and as such can be merged and then realize that no value of the if condition can result in a call to `please_delete`. As such, for now, we do not attempt to specifically optimize for that case.
Resolves #52579.