Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OsStr methods for testing, stripping, and splitting Unicode prefixes. #111059

Closed
wants to merge 2 commits into from

Conversation

jmillikin
Copy link
Contributor

@jmillikin jmillikin commented May 1, 2023

ACP: rust-lang/libs-team#114

Discussion on that ACP seems to have quieted down without a clear consensus, so I'm going to throw this PR out for consideration. The general idea is that it's difficult to write prefix/suffix manipulations for OsStr that accept non-Unicode patterns[0], but the most pressing need (inspecting contents of args_os) are all based around Unicode patterns, and a big chunk of those operate exclusively on prefixes.

The first commit in this PR adds the OsStr::to_str_split() and OsString::into_string_split() methods, which extract the longest prefix from an OsStr / OsString that is valid Unicode. This prefix can then be parsed like normal, in platform-independent logic.

The second commit is based on discussion in the above ACP. I had considered this to be separate functionality, but there was some concern that extracting a Unicode prefix wasn't useful on its own, so I added some helper functions:

  • OsStr::starts_with() tests whether an OsStr has a prefix matching the given Pattern.
    • It can be used as: if arg.starts_with("--") {
  • OsStr::strip_prefix() returns the OsStr after removing a prefix matching the given Pattern.
    • It can be used as: if let Some(arg_value) = arg.strip_prefix("--some-flag=") {
  • OsStr::split_once() splits an OsStr into a (&str, &OsStr) pair, where the delimiter matches a given Pattern.
    • It can be used as: let Some((flag_name, flag_value)) = arg.split_once("=") {

OsStr::starts_with_str() and OsStr::strip_prefix_str() are specialized variants that are implemented as &[u8] comparisons, which I expect to be more efficient than the Pattern versions since they don't need to validate UTF-8.

[0] See extensive prior discussions, for example rfcs/2295.

@rustbot
Copy link
Collaborator

rustbot commented May 1, 2023

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 1, 2023
@rustbot
Copy link
Collaborator

rustbot commented May 1, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@jmillikin
Copy link
Contributor Author

@rustbot label +T-libs-api -T-libs

@rustbot rustbot added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 1, 2023
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@jmillikin jmillikin force-pushed the osstr-str-split branch 2 times, most recently from c581eec to 432f15d Compare May 1, 2023 12:55
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

* `OsStr::starts_with()` tests whether an `OsStr` has a prefix matching
  the given `Pattern`.

* `OsStr::strip_prefix()` returns the `OsStr` after removing a prefix
  matching the given `Pattern`.

* `OsStr::split_once()` splits an `OsStr` into a `(&str, &OsStr)` pair,
  where the delimiter matches a given `Pattern`.

* `OsStr::starts_with_str()` and `OsStr::strip_prefix_str()` are
  specialized variants that are implemented more efficiently than the
  `Pattern` cases.

In all cases, the prefix must be Unicode because the current `Pattern`
trait is built around the `&str` type.
@Mark-Simulacrum
Copy link
Member

r? @joshtriplett

@bors
Copy link
Contributor

bors commented Jun 14, 2023

☔ The latest upstream changes (presumably #112624) made this pull request unmergeable. Please resolve the merge conflicts.

@jmillikin jmillikin closed this Jul 4, 2023
@jmillikin jmillikin deleted the osstr-str-split branch July 4, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants