-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add winsomely that converts args to a command line #33465
Conversation
b0fb3ce
to
cf6fb2f
Compare
base/shell.jl
Outdated
@@ -253,3 +253,84 @@ julia> Base.shell_escape_posixly("echo", "this", "&&", "that") | |||
""" | |||
shell_escape_posixly(args::AbstractString...) = | |||
sprint(print_shell_escaped_posixly, args...) | |||
|
|||
|
|||
function print_shell_escaped_wincmd(io::IO, args::AbstractString...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more study and analysis, I don't think this function is correct. I've gone ahead and taken a shot at writing out the algorithms for each, in the hopes that you or someone else would be able to take them over and get them in a condition to land. I haven't tested these much. But I think these implement the quoting functions you seek.
function print_shell_escaped_winsomely(io::IO, args::AbstractString...)
# semi-port of libuv's quote_cmd_arg function
first = true
for arg in args
first || write(io, ' ')
first = false
if isempty(arg)
print(io, "\"\"")
elseif all(!in(" \t\""), arg)
print(io, arg)
elseif all(!in("\"\\"), arg)
print(io, "\"", arg, "\"")
else
arg = replace(arg, r"(\\*)\"" => s"\1\1\\\\\"") # double backslashes before quote
arg = replace(arg, r"(\\+)$" => s"\1\1") # double backslashes before end
print(io, "\"", arg, "\"")
end
first = false
end
end
The replace
could also use the algorithm from below:
#= Expected input/output:
* input : hello"world
* output: "hello\"world"
* input : hello""world
* output: "hello\"\"world"
* input : hello\world
* output: hello\world
* input : hello\\world
* output: hello\\world
* input : hello\"world
* output: "hello\\\"world"
* input : hello\\"world
* output: "hello\\\\\"world"
* input : hello world\
* output: "hello world\\"
=#
backslashes = 0
for c in arg
if c == '\\'
backslashes += 1
else
c == '"' && (backslashes = backslashes * 2 + 1)
for j = 1:backslashes
write(io, '\\')
end
backslashes = 0
write(io, c)
end
end
for j = 1:(backslashes * 2)
write(io, '\\')
end
function print_shell_escaped_CMDly(io::IO, arg::AbstractString)
any(in("\r\n"), arg) && throw("Encountered unsupported character by CMD")
arg = replace(arg, r"[()%!^\"<>&|]" => s"^\0")
print(io, arg)
end
function print_shell_escaped_BATCHly(io::IO, arg::AbstractString)
any(in("\r\n"), arg) && throw("Encountered unsupported character by CMD")
arg = replace(arg, r"[()!^\"<>&|]" => s"^\0")
arg = replace(arg, "%" => "%%")
print(io, arg)
end
Some doc-strings to go with the code:
"""
shell_escaped_winsomely(args::Union{Cmd,AbstractString...})::String
Convert the collection of strings `args` into single string suitable for passing as the argument
string for a Windows command line. Windows passes the entire command line as a single string to
the application (unlike POSIX systems, where the list of arguments are passed separately).
Many Windows API applications (including julia.exe), use the conventions of the [Microsoft C
runtime](https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments) to
split that command line into a list of strings. This function implements the inverse of such a
C runtime command-line parser. It joins command-line arguments to be passed to a Windows console
application into a command line, escaping or quoting meta characters such as space,
double quotes and backslash where needed. This may be useful in concert with the `windows_verbatim`
flag to [`Cmd`](@ref) when constructing process pipelines.
# Example
```jldoctest
julia> println(shell_escaped_winsomely("A B\\", "C"))
"A B\\" C
"""
shell_escaped_winsomely
"""
shell_escaped_CMDly(arg::AbstractString)::String
The unexported `shell_escaped_CMDly` function
takes a string and escapes any special characters in such a way that
it is safe to pass it as an argument to some `CMD.exe`.
This may be useful in concert with the `windows_verbatim`
flag to [`Cmd`](@ref) when constructing process pipelines.
See also [`shell_escaped_BATCHly`](@ref).
# Example
```jldoctest
julia> println(shell_escaped_CMDly("\"A B\\\" & C"))
^"A B\\^" ^& C
!important
Due to a peculiar behavior of the CMD, each command after a literal `|` character
(indicating a command pipeline) must have `shell_escaped_CMDly` applied twice. For example:
```
to_print = "All for 1 & 1 for all!"
run(Cmd(Cmd(["cmd /c \"break | echo $(shell_escaped_CMDly(shell_escaped_CMDly(to_print)))"]), windows_verbatim=true))
```
"""
shell_escaped_CMDly
"""
shell_escaped_BATCHly(arg::AbstractString)::String
Like [`shell_escaped_CMDly`](@ref), but appropriate for use inside batch (*.bat) files for use with `CMD.exe`.
"""
shell_escaped_BATCHly
To wrap-up, this is just a quick example to show what these are supposed to output:
julia> print_shell_escaped_winsomely(stdout, `'C:\d\"efg\'`...)
"C:\d\\\"ef g\\"
julia> print_shell_escaped_CMDly(stdout, ans) # ans = raw"\"C:\d\\\\\\\"efg\\\\\""
^"C:\d\\\^"ef g\\^"
Note that that the space is intentionally not escaped even though we did escape the "
, since escaping the "
doesn't affect its presence delineating the argument. That may sound confusing and unintuitive, but it's important to realize that CMD processing isn't quite what we're used to thinking of as (un)escaping, so you have to accept an adjustment to your thinking for this to become clear—and simple. We can test that our simple regex is correct on the builtin cmd attrib
from the cmd prompt, showing that it gets the one argument "Saved Games"
and not the two arguments "Saved
and Games"
(like a posix shell would have done, for example):
C:\Users\jameson>attrib ^"Saved Games^"
R C:\Users\jameson\Saved Games
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does your print_shell_escaped_CMDly
take care of the quote flag that CMD uses when parsing its input? This doesn't look right and needs more careful testing with stings that contain all possible permutations of " and ^ and space (after both even and odd numbers of ").
May I also suggest that the shell_escaped_winsomely function returns an array of individually escaped argument strings, and does not yet join them with spaces into a single string? Otherwise, the subsequent shell_escaped_CMDly function (if you really want to split this across two functions) will have a hard time to decide which space in its input is inside a string and which is an argument separator, as it will once more have to implement most of the Microsoft C parsing algorithm to count quotes in order to determine the function of each space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply disable it, so there are no permutations to test. The cmd
parser defines a mapping of a single string to single string, so spaces have no meaning to cmd. Thus it’s unnecessary to quote spaces (although it’s valid to just quote every character). Most resources on this seem to get that fact wrong and confuse cmd
with the utilities that ship with it (and also with special syntax like for
and if
), leading to a confusion of inaccurate claims. To quote correctly, one merely needs to identify the steps in the list that are relevant and perform the inverse list of operations in reverse. That list is actually unexpectedly short—it just needs to prefix a couple characters with ^
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link is also useful
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
Read the conclusions at the end:
Do:
- Always escape all arguments so that they will be decoded properly by CommandLineToArgvW, perhaps using my ArgvQuote function above.
- After step 1, then if and only if the command line produced will be interpreted by cmd, prefix each shell metacharacter (or each character) with a ^ character.
Do not:
- Simply add quotes around command line argument arguments without any further processing.
- Allow cmd to ever see an unescaped " character.
Effectively escape_winsomely
(I also hate this name but w/e) is step 1 of Raymond Chen's "Do" . Jameson's CMD escaping function does step 2. And as Jameson mentioned we could in principle just escape everything (although overkill) .
I fail to see why his implementation is incorrect or why the general idea here is incorrect . We are not trying to duplicate how CMD.exe parses the command line. We are, however, trying to send CMD.exe a valid escaped string from Julia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at an earlier PR #11478 it seems like the approach there is incorrect.
@vtjnash First, thank you for spending time to look at this. For now I decided to add the This function behaves the same as your implementation, just it's a lot faster to do the Do you think it would be worth it to get the PR merged in its current state and then continue onwards with your |
Converts the collection of strings 'args' into a Windows command line. Co-authored-by: Mustafa M. <mus-m@outlook.com> Co-authored-by: Markus Kuhn <mgk25@cl.cam.ac.uk> Co-authored-by: Jameson Nash <vtjnash+github@gmail.com>
win64 failure unrelated. This is ready on my end. |
Great! I suggest we give it a day or two in case anyone else wants to review and/or comment, then merge. |
|
I think we're kinda stuck with the adverb in the name because we already use posixly. Maybe |
Pinging @stevengj since he may be interested in this PR and https://github.com/JuliaLang/julia/pull/33474/files , since has worked on this topic. |
How do we feel with PR now that it has had week to settle? Is this + plus REPL shell mode an interest to include in julia? |
@vtjnash, I've assigned you as you've already done some review and you know this area well. |
I already approved it. It was just waiting in case someone wanted to comment or change the name. |
Let's bike shed: how do we fill about |
I have to confess, the "winsomely" part confused me at first, but it's a cute name, so 🤷♂ |
Ok I'm fine as with it too, so should we merge this then? It's been opened for a week without additional comments. I say let's do it. |
# Quote any arg that contains a whitespace (' ' or '\t') or a double quote mark '"'. | ||
# It's also valid to quote an arg with just a whitespace, | ||
# but the following may be 'safer', and both implementations are valid anyways. | ||
quotes = any(c -> c in (' ', '\t', '"'), arg) || isempty(arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having more time to think about this. I think we should remove "
from this list.
Unless we're told otherwise, don't quote unless we actually need to do so - hopefully avoid problems if programs won't parse quotes properly
we should also add a ; force = false
option that will indeed quote everything if needed.
Note that Base.shell_escape_winsomely() also inverts the escaping convention of Julia's raw string literals, or in fact any string literals passed to Julia *_str() macros (as you can see from the fact that the raw_str macro is just the identity function). Julia's raw/macro string literals just happen to use now the exact same quote+backslash escaping convention as Microsoft's CommandLineToArgvW function, which Microsoft's C compiler runtime uses to generate main() parameter argv[]. Therefore, shouldn't the loop content of Base.shell_escape_winsomely() instead be called Base.escape_raw_string(), in analogy to Base.escape_string(), and moved next to that into base/strings/io.jl? In particular, Base.shell_escape_winsomely() has nothing to do with any kind of shell! It is not an analog of Base.shell_escape_posixly(). It does not invert anything implemented in CMD.EXE. So the name still seems a bit wrong to me. |
Good point. Would you care to make a PR to move the code there and just keep the name here as an alias? That'd help fix #29580. (Awkwardly, CommandLineToArgvW doesn't actually implement the algorithm that's documented there in its documentation, nor does the entirely separate function which Microsoft's C compiler runtime uses to parse arguments—but that's a story for another day. The escaping done here is still correct.) |
From #30614 adressing @vtnash 's comments to move the function to a more appropriate location.
However see my comment there https://github.com/JuliaLang/julia/pull/30614/files/f3d3cf36c73b646b2d65a91fa2289e22d64b23f5#diff-06a75b3b926c52ae4594356ce1eb5405Hopefully I can get some inputs on how to test the special meta-char processing by cmd and also my concerns in the above comment.The code implements the inverse of
CommandLineToArgvW
.References
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
https://devblogs.microsoft.com/oldnewthing/?p=12833
https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?redirectedfrom=MSDN&view=vs-2019