-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Display
for invalid UTF-8 in OsStr
/Path
#136677
base: master
Are you sure you want to change the base?
Conversation
Display
for OsStr
/Path
Display
for invalid UTF-8 in OsStr
/Path
beff1c1
to
4471500
Compare
cuviper will be busy for a bit longer probably. |
also cc @ChrisDenton? |
I'll review this properly when I have a moment (probably tomorrow) but I would note there are a number of incidental changes, like |
Those changes came out of wanting to synchronize the I added If it would be better, I could split it into two PRs, one for the |
That seems like a fair motivation so I'll say it's fine as-is.
I feel |
4471500
to
d52e2b2
Compare
This comment has been minimized.
This comment has been minimized.
Thanks everyone for bearing with me, since I'm new contributing to Rust. Since this PR tries to do too much at once, I'm splitting it into three PRs for more focused review. They do not depend on each other and can be reviewed in any order. The feedback so far has only been on the refactors, which are now in #137154.
|
84af8a4
to
6f01d7d
Compare
☔ The latest upstream changes (presumably #137036) made this pull request unmergeable. Please resolve the merge conflicts. |
The Display implementation for `OsStr` and `Path` on Windows (the WTF-8 version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an unpaired surrogate. Implement its Display with the same logic as that of `str`. Fixes rust-lang#136617 for Windows.
Benchmark shows that performance is unaffected (`x bench library/alloc --stage 0 --test-args from_utf8_lossy`).
The Display implementation for `OsStr` and `Path` on non-Windows systems (the byte-oriented version) only handles formatter flags when the entire string is valid UTF-8. As most paths are valid UTF-8, the common case is formatted like `str`; however, flags are ignored when they contain an invalid UTF-8 sequence. Implement its Display with the same logic as that of `str`. Additionally, it forwards flags when formatting the last chunk of valid UTF-8, so it erroneously inserts padding within the string when it has invalid UTF-8. Fix this. Fixes rust-lang#136617 for non-Windows systems.
6f01d7d
to
bf5f172
Compare
@joshtriplett I just noticed the new bstr feature (🎉!). Its |
Here's an update on where this PR is at. Its scope has the potential to increase a lot. First of all, my goal is to make the I submitted #136662 (Count char width at most once in This motivates similar types for The addition of If we're going to move the bytes What I propose is necessary for this PR:
Nice to have:
I'll be opening separate PRs for each component, so we can discuss them more focused. However, this PR will cherry pick those commits until they merge. How does this strategy sound? |
Since this will be blocked for a while, I intend to open a tiny PR which only fixes the correctness bug for bytes #![cfg(unix)]
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
let s = OsStr::from_bytes(b"b\xFFd");
assert_eq!(format!("a{:^10}e", s.display()), "ab� d e"); // BUG |
@rustbot author |
That sounds good to me. Maybe you could open a tracking issue for this? It seems like it might be good to nominate it for t-libs discussion if only to raise awareness. |
Fix the
Display
implementations forOsStr
,OsString
,Path
, andPathBuf
, which incorrectly handle formatter flags. This is visible on stable (for paths) or with#![feature(os_str_display)]
(for OS strings).I do the formatting by scanning in two passes, similarly to the the
Display
impl forstr
. The first pass counts the number of Unicode scalars to render. Since there's no escaping, this is relatively inexpensive. Then the left padding is written. Then the second pass writes the string with lossy substitutions. Finally, the right padding is written.I have based it on the logic of
Formatter::pad
, which I optimized and refactored in #136662. That PR should be reviewed first.Fixes #136617. See there for more details.