Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Windows Argument Parsing #87580

Merged
merged 2 commits into from
Sep 2, 2021
Merged

Conversation

ChrisDenton
Copy link
Member

@ChrisDenton ChrisDenton commented Jul 29, 2021

Fixes #44650

The Windows command line is passed to applications as a single string which the application then parses to get a list of arguments. The standard rules (as used by C/C++) for parsing the command line have slightly changed over the years, most recently in 2008 which added new escaping rules.

This PR implements the new rules as described on MSDN and further detailed here. It has been tested against the behaviour of C++ by calling a C++ program that outputs its raw command line and the contents of argv. See my repo if anyone wants to reproduce my work.

For an overview of how this PR changes argument parsing behavior and why we feel it is warranted see #87580 (comment).

For some examples see: #87580 (comment)

@rust-highfive
Copy link
Collaborator

r? @dtolnay

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 29, 2021
@dtolnay dtolnay added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 29, 2021
@dtolnay
Copy link
Member

dtolnay commented Jul 29, 2021

r? @rust-lang/libs

@rust-highfive rust-highfive assigned kennytm and unassigned dtolnay Jul 29, 2021
library/std/src/sys/windows/args.rs Outdated Show resolved Hide resolved
///
/// This function was tested for equivalence to the C/C++ parsing rules using an
/// extensive test suite available at
/// <https://github.com/ChrisDenton/winarg/tree/std>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to integrate some or all of this into our testsuite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, more extensive testing is slow because of the overhead of starting a process but perhaps some more limited examples can be tested. However, I wasn't sure about compiling and running a C++ within the testsuite. Is that ok?

library/std/src/sys/windows/args.rs Outdated Show resolved Hide resolved
@yaahc
Copy link
Member

yaahc commented Aug 5, 2021

PR looks great @ChrisDenton. I caught a couple of typos you might want to fix but other than that I'm happy to merge this when you're ready.

@bors delegate+

@bors
Copy link
Contributor

bors commented Aug 5, 2021

✌️ @ChrisDenton can now approve this pull request

@m-ou-se m-ou-se added relnotes Marks issues that should be documented in the release notes of the next release. O-windows Operating system: Windows labels Aug 5, 2021
@yaahc
Copy link
Member

yaahc commented Aug 5, 2021

One follow up, since this changes runtime behavior it would help to have a short overview we can add to the release notes written up on how this changes the runtime behavior and why we feel this breakage is justified.

@ChrisDenton
Copy link
Member Author

ChrisDenton commented Aug 5, 2021

Good point. I'm running local tests now and maybe I can figure out a summary in the meantime. I'm not very good a being concise but this is the gist of it:

Most of the parsing differences are minor and only really affect weird edge-cases. The big change is escaping " when already within a quote. This can be done by using two consecutive double quotes (i.e. "").

The main reason for updating the parsing rules is that it provides more consistent behaviour across modern (since 2008) utilities. This matters both for the user and for shells. E.g. powershell takes a string provided by the user, parses it according to powershell rules (expanding variables, etc), then produces a command line which it passes to the program. It expects the application to parse this command line in a standard way. So both powershell and the application should fully agree on how many arguments there are and what those arguments are.

However, the new double quote escaping rules are mostly for users. This is of particular concern when using the command prompt (aka cmd.exe) which passes the command line directly to the program. Or even when creating a shortcut via the gui which allows directly entering arguments. Microsoft added the new "" escaping rule to make this easier to follow, especially when nesting quotes and what with the fact that the other escape character (\) is also a path separator. This Stack Overflow answer has a few examples of where it can make code more legible. The problem with not applying these new rules is that the application may see different arguments than the user intended.

@yaahc
Copy link
Member

yaahc commented Aug 5, 2021

That looks great, I've gone ahead and added a link to the overview comment in the PR description. Thank you again.

@ChrisDenton
Copy link
Member Author

I'd also add the small note that since Microsoft were willing to make the changes to their C/C++ argument parsing, they were presumably confident that it wouldn't be too disruptive. Of course that doesn't rule out the possibility of this change breaking something in the Rust ecosystem.

@ChrisDenton
Copy link
Member Author

@bors r+

@bors
Copy link
Contributor

bors commented Aug 6, 2021

📌 Commit fbe56912f601c300d9b34b5b4f953742440c5b51 has been approved by ChrisDenton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 6, 2021
@m-ou-se m-ou-se added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 6, 2021
@yaahc
Copy link
Member

yaahc commented Aug 6, 2021

I just realized I should have FCPed this since it's a change to the stable API even though it's not adding anything new. In this case it's probably okay since we already discussed this a bit and the change seems pretty subtle and straightforward but I'm gonna cc @rust-lang/libs-api just to make sure everyone has a chance to see it.

@ChrisDenton
Copy link
Member Author

Ah, right. And I think I should have used r= there? Sorry, I'm a bit fuzzy about Bors commands.

@yaahc
Copy link
Member

yaahc commented Aug 6, 2021

Ah, right. And I think I should have used r= there? Sorry, I'm a bit fuzzy about Bors commands.

Nonono, you didn't do anything wrong. I shouldn't have delegated bors before the FCP. This is all me, not you, and not a big deal either way.

@bors
Copy link
Contributor

bors commented Aug 6, 2021

⌛ Testing commit fbe56912f601c300d9b34b5b4f953742440c5b51 with merge f20948d35517bbc3b7f7c79fe7df1ccf2d4e3d85...

@bors
Copy link
Contributor

bors commented Aug 6, 2021

💥 Test timed out

@rfcbot rfcbot removed the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Aug 28, 2021
@rfcbot
Copy link

rfcbot commented Aug 28, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@rfcbot rfcbot added the to-announce Announce this issue on triage meeting label Aug 28, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Aug 31, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Aug 31, 2021

📌 Commit e26dda5 has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 31, 2021
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Sep 1, 2021
…-ou-se

Update Windows Argument Parsing

Fixes rust-lang#44650

The Windows command line is passed to applications [as a single string](https://docs.microsoft.com/en-us/archive/blogs/larryosterman/the-windows-command-line-is-just-a-string) which the application then parses to get a list of arguments. The standard rules (as used by C/C++) for parsing the command line have slightly changed over the years, most recently in 2008 which added new escaping rules.

This PR implements the new rules as [described on MSDN](https://docs.microsoft.com/en-us/cpp/cpp/main-function-command-line-args?view=msvc-160#parsing-c-command-line-arguments) and [further detailed here](https://daviddeley.com/autohotkey/parameters/parameters.htm#WIN). It has been tested against the behaviour of C++ by calling a C++ program that outputs its raw command line and the contents of `argv`. See [my repo](https://github.com/ChrisDenton/winarg/tree/std) if anyone wants to reproduce my work.

For an overview of how this PR changes argument parsing behavior and why we feel it is warranted see rust-lang#87580 (comment).

For some examples see: rust-lang#87580 (comment)
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Sep 1, 2021
…-ou-se

Update Windows Argument Parsing

Fixes rust-lang#44650

The Windows command line is passed to applications [as a single string](https://docs.microsoft.com/en-us/archive/blogs/larryosterman/the-windows-command-line-is-just-a-string) which the application then parses to get a list of arguments. The standard rules (as used by C/C++) for parsing the command line have slightly changed over the years, most recently in 2008 which added new escaping rules.

This PR implements the new rules as [described on MSDN](https://docs.microsoft.com/en-us/cpp/cpp/main-function-command-line-args?view=msvc-160#parsing-c-command-line-arguments) and [further detailed here](https://daviddeley.com/autohotkey/parameters/parameters.htm#WIN). It has been tested against the behaviour of C++ by calling a C++ program that outputs its raw command line and the contents of `argv`. See [my repo](https://github.com/ChrisDenton/winarg/tree/std) if anyone wants to reproduce my work.

For an overview of how this PR changes argument parsing behavior and why we feel it is warranted see rust-lang#87580 (comment).

For some examples see: rust-lang#87580 (comment)
@bors
Copy link
Contributor

bors commented Sep 2, 2021

⌛ Testing commit e26dda5 with merge 1cf8fdd...

@bors
Copy link
Contributor

bors commented Sep 2, 2021

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing 1cf8fdd to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 2, 2021
@bors bors merged commit 1cf8fdd into rust-lang:master Sep 2, 2021
@rustbot rustbot added this to the 1.56.0 milestone Sep 2, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 16, 2021
@ChrisDenton ChrisDenton deleted the win-arg-parse-2008 branch October 5, 2021 11:06
wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Oct 22, 2021
Pkgsrc changes:
 * Remove one now-longer-applicable patch, adjust a few others
 * Bump bootstrap requirements to 1.55.0.

Upstream changes:

Version 1.56.0 (2021-10-21)
========================

Language
--------

- [The 2021 Edition is now stable.][rust#88100]
  See [the edition guide][rust-2021-edition-guide] for more details.
- [The pattern in `binding @ pattern` can now also introduce new bindings.]
  [rust#85305]
- [Union field access is permitted in `const fn`.][rust#85769]

[rust-2021-edition-guide]: https://doc.rust-lang.org/nightly/edition-guide/rust-2021/index.html

Compiler
--------

- [Upgrade to LLVM 13.][rust#87570]
- [Support memory, address, and thread sanitizers on
  aarch64-unknown-freebsd.][rust#88023]
- [Allow specifying a deployment target version for all iOS targets][rust#87699]
- [Warnings can be forced on with `--force-warn`.][rust#87472]
  This feature is primarily intended for usage by `cargo fix`,
  rather than end users.
- [Promote `aarch64-apple-ios-sim` to Tier 2\*.][rust#87760]
- [Add `powerpc-unknown-freebsd` at Tier 3\*.][rust#87370]
- [Add `riscv32imc-esp-espidf` at Tier 3\*.][rust#87666]

\* Refer to Rust's [platform support page][platform-support-doc] for more
information on Rust's tiered platform support.

Libraries
---------

- [Allow writing of incomplete UTF-8 sequences via stdout/stderr on Windows.]
  [rust#83342]
  The Windows console still requires valid Unicode, but this change allows
  splitting a UTF-8 character across multiple write calls. This allows, for
  instance, programs that just read and write data buffers (e.g. copying a file
  to stdout) without regard for Unicode or character boundaries.
- [Prefer `AtomicU{64,128}` over Mutex for Instant backsliding protection.]
  [rust#83093]
  For this use case, atomics scale much better under contention.
- [Implement `Extend<(A, B)>` for `(Extend<A>, Extend<B>)`][rust#85835]
- [impl Default, Copy, Clone for std::io::Sink and std::io::Empty][rust#86744]
- [`impl From<[(K, V); N]>` for all collections.][rust#84111]
- [Remove `P: Unpin` bound on impl Future for Pin.][rust#81363]
- [Treat invalid environment variable names as non-existent.][rust#86183]
  Previously, the environment functions would panic if given a
  variable name with an internal null character or equal sign (`=`).
  Now, these functions will just treat such names as non-existent
  variables, since the OS cannot represent the existence of a
  variable with such a name.

Stabilised APIs
---------------

- [`std::os::unix::fs::chroot`]
- [`UnsafeCell::raw_get`]
- [`BufWriter::into_parts`]
- [`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`]
  These APIs were previously stable in `std`, but are now also available
  in `core`.
- [`Vec::shrink_to`]
- [`String::shrink_to`]
- [`OsString::shrink_to`]
- [`PathBuf::shrink_to`]
- [`BinaryHeap::shrink_to`]
- [`VecDeque::shrink_to`]
- [`HashMap::shrink_to`]
- [`HashSet::shrink_to`]

These APIs are now usable in const contexts:

- [`std::mem::transmute`]
- [`[T]::first`][`slice::first`]
- [`[T]::split_first`][`slice::split_first`]
- [`[T]::last`][`slice::last`]
- [`[T]::split_last`][`slice::split_last`]

Cargo
-----

- [Cargo supports specifying a minimum supported Rust version in Cargo.toml.]
  [`rust-version`] This has no effect at present on dependency
  version selection.  We encourage crates to specify their minimum
  supported Rust version, and we encourage CI systems that support
  Rust code to include a crate's specified minimum version in the
  text matrix for that crate by default.

Compatibility notes
-------------------

- [Update to new argument parsing rules on Windows.][rust#87580]
  This adjusts Rust's standard library to match the behavior of the standard
  libraries for C/C++. The rules have changed slightly over time, and this PR
  brings us to the latest set of rules (changed in 2008).
- [Disallow the aapcs calling convention on aarch64][rust#88399]
  This was already not supported by LLVM; this change surfaces this lack of
  support with a better error message.
- [Make `SEMICOLON_IN_EXPRESSIONS_FROM_MACROS` warn by default][rust#87385]
- [Warn when an escaped newline skips multiple lines.][rust#87671]
- [Calls to `libc::getpid` / `std::process::id` from `Command::pre_exec`
  may return different values on glibc <= 2.24.][rust#81825] Rust
  now invokes the `clone3` system call directly, when available,
  to use new functionality available via that system call. Older
  versions of glibc cache the result of `getpid`, and only update
  that cache when calling glibc's clone/fork functions, so a direct
  system call bypasses that cache update. glibc 2.25 and newer no
  longer cache `getpid` for exactly this reason.

Internal changes
----------------
These changes provide no direct user facing benefits, but represent
significant improvements to the internals and overall performance
of rustc and related tools.

- [LLVM is compiled with PGO in published x86_64-unknown-linux-gnu
  artifacts.][rust#88069] This improves the performance of most Rust builds.

- [Unify representation of macros in internal data structures.][rust#88019]
  This change fixes a host of bugs with the handling of macros by the compiler,
  as well as rustdoc.

[`std::os::unix::fs::chroot`]: https://doc.rust-lang.org/stable/std/os/unix/fs/fn.chroot.html
[`Iterator::intersperse`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse
[`Iterator::intersperse_with`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse
[`UnsafeCell::raw_get`]: https://doc.rust-lang.org/stable/std/cell/struct.UnsafeCell.html#method.raw_get
[`BufWriter::into_parts`]: https://doc.rust-lang.org/stable/std/io/struct.BufWriter.html#method.into_parts
[`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`]: rust-lang/rust#84662
[`Vec::shrink_to`]: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.shrink_to
[`String::shrink_to`]: https://doc.rust-lang.org/stable/std/string/struct.String.html#method.shrink_to
[`OsString::shrink_to`]: https://doc.rust-lang.org/stable/std/ffi/struct.OsString.html#method.shrink_to
[`PathBuf::shrink_to`]: https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#method.shrink_to
[`BinaryHeap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.BinaryHeap.html#method.shrink_to
[`VecDeque::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html#method.shrink_to
[`HashMap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_map/struct.HashMap.html#method.shrink_to
[`HashSet::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_set/struct.HashSet.html#method.shrink_to
[`std::mem::transmute`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute.html
[`slice::first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first
[`slice::split_first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first
[`slice::last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last
[`slice::split_last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last
[`rust-version`]: https://doc.rust-lang.org/nightly/cargo/reference/manifest.html#the-rust-version-field
[rust#87671]: rust-lang/rust#87671
[rust#86183]: rust-lang/rust#86183
[rust#87385]: rust-lang/rust#87385
[rust#88100]: rust-lang/rust#88100
[rust#86860]: rust-lang/rust#86860
[rust#84039]: rust-lang/rust#84039
[rust#86492]: rust-lang/rust#86492
[rust#88363]: rust-lang/rust#88363
[rust#85305]: rust-lang/rust#85305
[rust#87832]: rust-lang/rust#87832
[rust#88069]: rust-lang/rust#88069
[rust#87472]: rust-lang/rust#87472
[rust#87699]: rust-lang/rust#87699
[rust#87570]: rust-lang/rust#87570
[rust#88023]: rust-lang/rust#88023
[rust#87760]: rust-lang/rust#87760
[rust#87370]: rust-lang/rust#87370
[rust#87580]: rust-lang/rust#87580
[rust#83342]: rust-lang/rust#83342
[rust#83093]: rust-lang/rust#83093
[rust#88177]: rust-lang/rust#88177
[rust#88548]: rust-lang/rust#88548
[rust#88551]: rust-lang/rust#88551
[rust#88299]: rust-lang/rust#88299
[rust#88220]: rust-lang/rust#88220
[rust#85835]: rust-lang/rust#85835
[rust#86879]: rust-lang/rust#86879
[rust#86744]: rust-lang/rust#86744
[rust#84662]: rust-lang/rust#84662
[rust#86593]: rust-lang/rust#86593
[rust#81050]: rust-lang/rust#81050
[rust#81363]: rust-lang/rust#81363
[rust#84111]: rust-lang/rust#84111
[rust#85769]: rust-lang/rust#85769 (comment)
[rust#88490]: rust-lang/rust#88490
[rust#88269]: rust-lang/rust#88269
[rust#84176]: rust-lang/rust#84176
[rust#88399]: rust-lang/rust#88399
[rust#88227]: rust-lang/rust#88227
[rust#88200]: rust-lang/rust#88200
[rust#82776]: rust-lang/rust#82776
[rust#88077]: rust-lang/rust#88077
[rust#87728]: rust-lang/rust#87728
[rust#87050]: rust-lang/rust#87050
[rust#87619]: rust-lang/rust#87619
[rust#81825]: rust-lang/rust#81825 (comment)
[rust#88019]: rust-lang/rust#88019
[rust#87666]: rust-lang/rust#87666
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Nov 20, 2021
Pkgsrc changes:
 * Bump bootstrap kit version to 1.55.0.
 * Adjust patches as needed, some no longer apply (so removed)
 * Update checksum adjustments.
 * Avoid rust-llvm on SunOS
 * Optionally build docs
 * Remove reference to closed/old PR#54621

Upstream changes:

Version 1.56.1 (2021-11-01)
===========================

- New lints to detect the presence of bidirectional-override Unicode
  codepoints in the compiled source code ([CVE-2021-42574])

[CVE-2021-42574]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574

Version 1.56.0 (2021-10-21)
========================

Language
--------

- [The 2021 Edition is now stable.][rust#88100]
  See [the edition guide][rust-2021-edition-guide] for more details.
- [The pattern in `binding @ pattern` can now also introduce new bindings.]
  [rust#85305]
- [Union field access is permitted in `const fn`.][rust#85769]

[rust-2021-edition-guide]:
  https://doc.rust-lang.org/nightly/edition-guide/rust-2021/index.html

Compiler
--------

- [Upgrade to LLVM 13.][rust#87570]
- [Support memory, address, and thread sanitizers on aarch64-unknown-freebsd.]
  [rust#88023]
- [Allow specifying a deployment target version for all iOS targets][rust#87699]
- [Warnings can be forced on with `--force-warn`.][rust#87472]
  This feature is primarily intended for usage by `cargo fix`, rather than
  end users.
- [Promote `aarch64-apple-ios-sim` to Tier 2\*.][rust#87760]
- [Add `powerpc-unknown-freebsd` at Tier 3\*.][rust#87370]
- [Add `riscv32imc-esp-espidf` at Tier 3\*.][rust#87666]

\* Refer to Rust's [platform support page][platform-support-doc] for more
information on Rust's tiered platform support.

Libraries
---------

- [Allow writing of incomplete UTF-8 sequences via stdout/stderr on Windows.]
  [rust#83342]
  The Windows console still requires valid Unicode, but this change allows
  splitting a UTF-8 character across multiple write calls. This allows, for
  instance, programs that just read and write data buffers (e.g. copying a file
  to stdout) without regard for Unicode or character boundaries.
- [Prefer `AtomicU{64,128}` over Mutex for Instant backsliding protection.]
  [rust#83093]
  For this use case, atomics scale much better under contention.
- [Implement `Extend<(A, B)>` for `(Extend<A>, Extend<B>)`][rust#85835]
- [impl Default, Copy, Clone for std::io::Sink and std::io::Empty][rust#86744]
- [`impl From<[(K, V); N]>` for all collections.][rust#84111]
- [Remove `P: Unpin` bound on impl Future for Pin.][rust#81363]
- [Treat invalid environment variable names as non-existent.][rust#86183]
  Previously, the environment functions would panic if given a
  variable name with an internal null character or equal sign (`=`).
  Now, these functions will just treat such names as non-existent
  variables, since the OS cannot represent the existence of a
  variable with such a name.

Stabilised APIs
---------------

- [`std::os::unix::fs::chroot`]
- [`UnsafeCell::raw_get`]
- [`BufWriter::into_parts`]
- [`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`]
  These APIs were previously stable in `std`, but are now also available
  in `core`.
- [`Vec::shrink_to`]
- [`String::shrink_to`]
- [`OsString::shrink_to`]
- [`PathBuf::shrink_to`]
- [`BinaryHeap::shrink_to`]
- [`VecDeque::shrink_to`]
- [`HashMap::shrink_to`]
- [`HashSet::shrink_to`]

These APIs are now usable in const contexts:

- [`std::mem::transmute`]
- [`[T]::first`][`slice::first`]
- [`[T]::split_first`][`slice::split_first`]
- [`[T]::last`][`slice::last`]
- [`[T]::split_last`][`slice::split_last`]

Cargo
-----

- [Cargo supports specifying a minimum supported Rust version in Cargo.toml.]
  [`rust-version`]
  This has no effect at present on dependency version selection.
  We encourage crates to specify their minimum supported Rust
  version, and we encourage CI systems that support Rust code to
  include a crate's specified minimum version in the text matrix
  for that crate by default.

Compatibility notes
-------------------

- [Update to new argument parsing rules on Windows.][rust#87580]
  This adjusts Rust's standard library to match the behavior of the standard
  libraries for C/C++. The rules have changed slightly over time, and this PR
  brings us to the latest set of rules (changed in 2008).
- [Disallow the aapcs calling convention on aarch64][rust#88399]
  This was already not supported by LLVM; this change surfaces this lack of
  support with a better error message.
- [Make `SEMICOLON_IN_EXPRESSIONS_FROM_MACROS` warn by default][rust#87385]
- [Warn when an escaped newline skips multiple lines.][rust#87671]
- [Calls to `libc::getpid` / `std::process::id` from `Command::pre_exec`
  may return different values on glibc <= 2.24.][rust#81825]
  Rust now invokes the `clone3` system call directly, when available,
  to use new functionality available via that system call. Older
  versions of glibc cache the result of `getpid`, and only update
  that cache when calling glibc's clone/fork functions, so a direct
  system call bypasses that cache update. glibc 2.25 and newer no
  longer cache `getpid` for exactly this reason.

Internal changes
----------------
These changes provide no direct user facing benefits, but represent
significant improvements to the internals and overall performance
of rustc and related tools.

- [LLVM is compiled with PGO in published x86_64-unknown-linux-gnu artifacts.]
  [rust#88069]
  This improves the performance of most Rust builds.
- [Unify representation of macros in internal data structures.][rust#88019]
  This change fixes a host of bugs with the handling of macros by the compiler,
  as well as rustdoc.

[`std::os::unix::fs::chroot`]: https://doc.rust-lang.org/stable/std/os/unix/fs/fn.chroot.html
[`Iterator::intersperse`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse
[`Iterator::intersperse_with`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse
[`UnsafeCell::raw_get`]: https://doc.rust-lang.org/stable/std/cell/struct.UnsafeCell.html#method.raw_get
[`BufWriter::into_parts`]: https://doc.rust-lang.org/stable/std/io/struct.BufWriter.html#method.into_parts
[`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`]: rust-lang/rust#84662
[`Vec::shrink_to`]: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.shrink_to
[`String::shrink_to`]: https://doc.rust-lang.org/stable/std/string/struct.String.html#method.shrink_to
[`OsString::shrink_to`]: https://doc.rust-lang.org/stable/std/ffi/struct.OsString.html#method.shrink_to
[`PathBuf::shrink_to`]: https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#method.shrink_to
[`BinaryHeap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.BinaryHeap.html#method.shrink_to
[`VecDeque::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html#method.shrink_to
[`HashMap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_map/struct.HashMap.html#method.shrink_to
[`HashSet::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_set/struct.HashSet.html#method.shrink_to
[`std::mem::transmute`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute.html
[`slice::first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first
[`slice::split_first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first
[`slice::last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last
[`slice::split_last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last
[`rust-version`]: https://doc.rust-lang.org/nightly/cargo/reference/manifest.html#the-rust-version-field
[rust#87671]: rust-lang/rust#87671
[rust#86183]: rust-lang/rust#86183
[rust#87385]: rust-lang/rust#87385
[rust#88100]: rust-lang/rust#88100
[rust#86860]: rust-lang/rust#86860
[rust#84039]: rust-lang/rust#84039
[rust#86492]: rust-lang/rust#86492
[rust#88363]: rust-lang/rust#88363
[rust#85305]: rust-lang/rust#85305
[rust#87832]: rust-lang/rust#87832
[rust#88069]: rust-lang/rust#88069
[rust#87472]: rust-lang/rust#87472
[rust#87699]: rust-lang/rust#87699
[rust#87570]: rust-lang/rust#87570
[rust#88023]: rust-lang/rust#88023
[rust#87760]: rust-lang/rust#87760
[rust#87370]: rust-lang/rust#87370
[rust#87580]: rust-lang/rust#87580
[rust#83342]: rust-lang/rust#83342
[rust#83093]: rust-lang/rust#83093
[rust#88177]: rust-lang/rust#88177
[rust#88548]: rust-lang/rust#88548
[rust#88551]: rust-lang/rust#88551
[rust#88299]: rust-lang/rust#88299
[rust#88220]: rust-lang/rust#88220
[rust#85835]: rust-lang/rust#85835
[rust#86879]: rust-lang/rust#86879
[rust#86744]: rust-lang/rust#86744
[rust#84662]: rust-lang/rust#84662
[rust#86593]: rust-lang/rust#86593
[rust#81050]: rust-lang/rust#81050
[rust#81363]: rust-lang/rust#81363
[rust#84111]: rust-lang/rust#84111
[rust#85769]: rust-lang/rust#85769 (comment)
[rust#88490]: rust-lang/rust#88490
[rust#88269]: rust-lang/rust#88269
[rust#84176]: rust-lang/rust#84176
[rust#88399]: rust-lang/rust#88399
[rust#88227]: rust-lang/rust#88227
[rust#88200]: rust-lang/rust#88200
[rust#82776]: rust-lang/rust#82776
[rust#88077]: rust-lang/rust#88077
[rust#87728]: rust-lang/rust#87728
[rust#87050]: rust-lang/rust#87050
[rust#87619]: rust-lang/rust#87619
[rust#81825]: rust-lang/rust#81825 (comment)
[rust#88019]: rust-lang/rust#88019
[rust#87666]: rust-lang/rust#87666

Version 1.55.0 (2021-09-09)
============================

Language
--------
- [You can now write open "from" range patterns (`X..`), which will start
  at `X` and will end at the maximum value of the integer.][83918]
- [You can now explicitly import the prelude of different editions
  through `std::prelude` (e.g. `use std::prelude::rust_2021::*;`).][86294]

Compiler
--------
- [Added tier 3\* support for `powerpc64le-unknown-freebsd`.][83572]

\* Refer to Rust's [platform support page][platform-support-doc] for more
   information on Rust's tiered platform support.

Libraries
---------

- [Updated std's float parsing to use the Eisel-Lemire algorithm.][86761]
  These improvements should in general provide faster string parsing of floats,
  no longer reject certain valid floating point values, and reduce
  the produced code size for non-stripped artifacts.
- [`string::Drain` now implements `AsRef<str>` and `AsRef<[u8]>`.][86858]

Stabilised APIs
---------------

- [`Bound::cloned`]
- [`Drain::as_str`]
- [`IntoInnerError::into_error`]
- [`IntoInnerError::into_parts`]
- [`MaybeUninit::assume_init_mut`]
- [`MaybeUninit::assume_init_ref`]
- [`MaybeUninit::write`]
- [`array::map`]
- [`ops::ControlFlow`]
- [`x86::_bittest`]
- [`x86::_bittestandcomplement`]
- [`x86::_bittestandreset`]
- [`x86::_bittestandset`]
- [`x86_64::_bittest64`]
- [`x86_64::_bittestandcomplement64`]
- [`x86_64::_bittestandreset64`]
- [`x86_64::_bittestandset64`]

The following previously stable functions are now `const`.

- [`str::from_utf8_unchecked`]


Cargo
-----
- [Cargo will now deduplicate compiler diagnostics to the terminal when invoking
  rustc in parallel such as when using `cargo test`.][cargo/9675]
- [The package definition in `cargo metadata` now includes the `"default_run"`
  field from the manifest.][cargo/9550]
- [Added `cargo d` as an alias for `cargo doc`.][cargo/9680]
- [Added `{lib}` as formatting option for `cargo tree` to print the `"lib_name"`
  of packages.][cargo/9663]

Rustdoc
-------
- [Added "Go to item on exact match" search option.][85876]
- [The "Implementors" section on traits no longer shows redundant
  method definitions.][85970]
- [Trait implementations are toggled open by default.][86260] This should
  make the implementations more searchable by tools like `CTRL+F` in
  your browser.
- [Intra-doc links should now correctly resolve associated items (e.g. methods)
  through type aliases.][86334]
- [Traits which are marked with `#[doc(hidden)]` will no longer appear in the
  "Trait Implementations" section.][86513]


Compatibility Notes
-------------------
- [std functions that return an `io::Error` will no longer use the
  `ErrorKind::Other` variant.][85746] This is to better reflect that these
  kinds of errors could be categorised [into newer more specific `ErrorKind`
  variants][79965], and that they do not represent a user error.
- [Using environment variable names with `process::Command` on Windows now
  behaves as expected.][85270] Previously using envionment variables with
  `Command` would cause them to be ASCII-uppercased.
- [Rustdoc will now warn on using rustdoc lints that aren't prefixed
  with `rustdoc::`][86849]

[86849]: rust-lang/rust#86849
[86513]: rust-lang/rust#86513
[86334]: rust-lang/rust#86334
[86260]: rust-lang/rust#86260
[85970]: rust-lang/rust#85970
[85876]: rust-lang/rust#85876
[83572]: rust-lang/rust#83572
[86294]: rust-lang/rust#86294
[86858]: rust-lang/rust#86858
[86761]: rust-lang/rust#86761
[85769]: rust-lang/rust#85769
[85746]: rust-lang/rust#85746
[85305]: rust-lang/rust#85305
[85270]: rust-lang/rust#85270
[84111]: rust-lang/rust#84111
[83918]: rust-lang/rust#83918
[79965]: rust-lang/rust#79965
[87370]: rust-lang/rust#87370
[87298]: rust-lang/rust#87298
[cargo/9663]: rust-lang/cargo#9663
[cargo/9675]: rust-lang/cargo#9675
[cargo/9550]: rust-lang/cargo#9550
[cargo/9680]: rust-lang/cargo#9680
[cargo/9663]: rust-lang/cargo#9663
[`array::map`]: https://doc.rust-lang.org/stable/std/primitive.array.html#method.map
[`Bound::cloned`]: https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.cloned
[`Drain::as_str`]: https://doc.rust-lang.org/stable/std/string/struct.Drain.html#method.as_str
[`IntoInnerError::into_error`]: https://doc.rust-lang.org/stable/std/io/struct.IntoInnerError.html#method.into_error
[`IntoInnerError::into_parts`]: https://doc.rust-lang.org/stable/std/io/struct.IntoInnerError.html#method.into_parts
[`MaybeUninit::assume_init_mut`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init_mut
[`MaybeUninit::assume_init_ref`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init_ref
[`MaybeUninit::write`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.write
[`Seek::rewind`]: https://doc.rust-lang.org/stable/std/io/trait.Seek.html#method.rewind
[`ops::ControlFlow`]: https://doc.rust-lang.org/stable/std/ops/enum.ControlFlow.html
[`str::from_utf8_unchecked`]: https://doc.rust-lang.org/stable/std/str/fn.from_utf8_unchecked.html
[`x86::_bittest`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittest.html
[`x86::_bittestandcomplement`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittestandcomplement.html
[`x86::_bittestandreset`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittestandreset.html
[`x86::_bittestandset`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittestandset.html
[`x86_64::_bittest64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittest64.html
[`x86_64::_bittestandcomplement64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittestandcomplement64.html
[`x86_64::_bittestandreset64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittestandreset64.html
[`x86_64::_bittestandset64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittestandset64.html
squeek502 added a commit to squeek502/zig that referenced this pull request Apr 14, 2024
…ToArgvW

On Windows, the command line arguments of a program are a single WTF-16 encoded string and its up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.

ziglang#18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`).

This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).

The motivation here is roughly the same as when the same change was made in Rust (rust-lang/rust#87580), that is (paraphrased):

- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward

Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows.
squeek502 added a commit to squeek502/zig that referenced this pull request Apr 14, 2024
…ToArgvW

On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.

ziglang#18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.

This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).

The motivation here is roughly the same as when the same change was made in Rust (rust-lang/rust#87580), that is (paraphrased):

- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward

Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
squeek502 added a commit to squeek502/zig that referenced this pull request Apr 14, 2024
…ToArgvW

On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.

ziglang#18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.

This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).

The motivation here is roughly the same as when the same change was made in Rust (rust-lang/rust#87580), that is (paraphrased):

- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward

Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
squeek502 added a commit to squeek502/zig that referenced this pull request Apr 15, 2024
…ToArgvW

On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.

ziglang#18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.

This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).

The motivation here is roughly the same as when the same change was made in Rust (rust-lang/rust#87580), that is (paraphrased):

- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward

Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
squeek502 added a commit to squeek502/zig that referenced this pull request Apr 15, 2024
…ToArgvW

On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.

ziglang#18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.

This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).

The motivation here is roughly the same as when the same change was made in Rust (rust-lang/rust#87580), that is (paraphrased):

- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward

Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
squeek502 added a commit to squeek502/zig that referenced this pull request Apr 15, 2024
…ToArgvW

On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.

ziglang#18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.

This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).

The motivation here is roughly the same as when the same change was made in Rust (rust-lang/rust#87580), that is (paraphrased):

- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward

Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. O-windows Operating system: Windows relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

env::args differs from MSVC CRT in some cases on Windows