Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add winsomely that converts args to a command line #33465

Merged
merged 2 commits into from
Oct 11, 2019

Conversation

musm
Copy link
Contributor

@musm musm commented Oct 3, 2019

From #30614 adressing @vtnash 's comments to move the function to a more appropriate location.

However see my comment there https://github.com/JuliaLang/julia/pull/30614/files/f3d3cf36c73b646b2d65a91fa2289e22d64b23f5#diff-06a75b3b926c52ae4594356ce1eb5405

Hopefully I can get some inputs on how to test the special meta-char processing by cmd and also my concerns in the above comment.

The code implements the inverse of CommandLineToArgvW.

References

https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

https://devblogs.microsoft.com/oldnewthing/?p=12833

https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?redirectedfrom=MSDN&view=vs-2019

base/shell.jl Outdated Show resolved Hide resolved
base/shell.jl Outdated Show resolved Hide resolved
base/shell.jl Outdated Show resolved Hide resolved
@musm musm force-pushed the winshellescape branch 2 times, most recently from b0fb3ce to cf6fb2f Compare October 4, 2019 01:54
base/shell.jl Outdated
@@ -253,3 +253,84 @@ julia> Base.shell_escape_posixly("echo", "this", "&&", "that")
"""
shell_escape_posixly(args::AbstractString...) =
sprint(print_shell_escaped_posixly, args...)


function print_shell_escaped_wincmd(io::IO, args::AbstractString...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more study and analysis, I don't think this function is correct. I've gone ahead and taken a shot at writing out the algorithms for each, in the hopes that you or someone else would be able to take them over and get them in a condition to land. I haven't tested these much. But I think these implement the quoting functions you seek.

function print_shell_escaped_winsomely(io::IO, args::AbstractString...)
    # semi-port of libuv's quote_cmd_arg function
    first = true
    for arg in args
        first || write(io, ' ')
        first = false
        if isempty(arg)
            print(io, "\"\"")
        elseif all(!in(" \t\""), arg)
            print(io, arg)
        elseif all(!in("\"\\"), arg)
            print(io, "\"", arg, "\"")
        else
            arg = replace(arg, r"(\\*)\"" => s"\1\1\\\\\"") # double backslashes before quote
            arg = replace(arg, r"(\\+)$" => s"\1\1") # double backslashes before end
            print(io, "\"", arg, "\"")
        end
        first = false
    end
end

The replace could also use the algorithm from below:

  #= Expected input/output:
   *   input : hello"world
   *   output: "hello\"world"
   *   input : hello""world
   *   output: "hello\"\"world"
   *   input : hello\world
   *   output: hello\world
   *   input : hello\\world
   *   output: hello\\world
   *   input : hello\"world
   *   output: "hello\\\"world"
   *   input : hello\\"world
   *   output: "hello\\\\\"world"
   *   input : hello world\
   *   output: "hello world\\"
   =#
    backslashes = 0
    for c in arg
        if c == '\\'
            backslashes += 1
        else
            c == '"' && (backslashes = backslashes * 2 + 1)
            for j = 1:backslashes
                write(io, '\\')
            end
            backslashes = 0
            write(io, c)
        end
    end
    for j = 1:(backslashes * 2)
        write(io, '\\')
    end
function print_shell_escaped_CMDly(io::IO, arg::AbstractString)
    any(in("\r\n"), arg) && throw("Encountered unsupported character by CMD")
    arg = replace(arg, r"[()%!^\"<>&|]" => s"^\0")
    print(io, arg)
end
function print_shell_escaped_BATCHly(io::IO, arg::AbstractString)
    any(in("\r\n"), arg) && throw("Encountered unsupported character by CMD")
    arg = replace(arg, r"[()!^\"<>&|]" => s"^\0")
    arg = replace(arg, "%" => "%%")
    print(io, arg)
end

Some doc-strings to go with the code:

"""
     shell_escaped_winsomely(args::Union{Cmd,AbstractString...})::String

Convert the collection of strings `args` into single string suitable for passing as the argument
string for a Windows command line. Windows passes the entire command line as a single string to
the application (unlike POSIX systems, where the list of arguments are passed separately).
Many Windows API applications (including julia.exe), use the conventions of the [Microsoft C
runtime](https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments) to
split that command line into a list of strings. This function implements the inverse of such a
C runtime command-line parser. It joins command-line arguments to be passed to a Windows console
application into a command line, escaping or quoting meta characters such as space,
double quotes and backslash where needed. This may be useful in concert with the `windows_verbatim`
flag to [`Cmd`](@ref) when constructing process pipelines.
# Example
```jldoctest
julia> println(shell_escaped_winsomely("A B\\", "C"))
"A B\\" C
"""
shell_escaped_winsomely

"""
     shell_escaped_CMDly(arg::AbstractString)::String

The unexported `shell_escaped_CMDly` function
takes a string and escapes any special characters in such a way that
it is safe to pass it as an argument to some `CMD.exe`.
This may be useful in concert with the `windows_verbatim`
flag to [`Cmd`](@ref) when constructing process pipelines.

See also [`shell_escaped_BATCHly`](@ref).

# Example
```jldoctest
julia> println(shell_escaped_CMDly("\"A B\\\" & C"))
^"A B\\^" ^& C

!important
    Due to a peculiar behavior of the CMD, each command after a literal `|` character
    (indicating a command pipeline) must have `shell_escaped_CMDly` applied twice. For example:
    ```
    to_print = "All for 1 & 1 for all!"
    run(Cmd(Cmd(["cmd /c \"break | echo $(shell_escaped_CMDly(shell_escaped_CMDly(to_print)))"]), windows_verbatim=true))
    ```
"""
shell_escaped_CMDly

"""
     shell_escaped_BATCHly(arg::AbstractString)::String

Like [`shell_escaped_CMDly`](@ref), but appropriate for use inside batch (*.bat) files for use with `CMD.exe`.
"""
shell_escaped_BATCHly

To wrap-up, this is just a quick example to show what these are supposed to output:

julia> print_shell_escaped_winsomely(stdout, `'C:\d\"efg\'`...)
"C:\d\\\"ef g\\"

julia> print_shell_escaped_CMDly(stdout, ans) # ans = raw"\"C:\d\\\\\\\"efg\\\\\""
^"C:\d\\\^"ef g\\^"

Note that that the space is intentionally not escaped even though we did escape the ", since escaping the " doesn't affect its presence delineating the argument. That may sound confusing and unintuitive, but it's important to realize that CMD processing isn't quite what we're used to thinking of as (un)escaping, so you have to accept an adjustment to your thinking for this to become clear—and simple. We can test that our simple regex is correct on the builtin cmd attrib from the cmd prompt, showing that it gets the one argument "Saved Games" and not the two arguments "Saved and Games" (like a posix shell would have done, for example):

C:\Users\jameson>attrib ^"Saved Games^"
     R               C:\Users\jameson\Saved Games

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does your print_shell_escaped_CMDly take care of the quote flag that CMD uses when parsing its input? This doesn't look right and needs more careful testing with stings that contain all possible permutations of " and ^ and space (after both even and odd numbers of ").

May I also suggest that the shell_escaped_winsomely function returns an array of individually escaped argument strings, and does not yet join them with spaces into a single string? Otherwise, the subsequent shell_escaped_CMDly function (if you really want to split this across two functions) will have a hard time to decide which space in its input is inside a string and which is an argument separator, as it will once more have to implement most of the Microsoft C parsing algorithm to count quotes in order to determine the function of each space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply disable it, so there are no permutations to test. The cmd parser defines a mapping of a single string to single string, so spaces have no meaning to cmd. Thus it’s unnecessary to quote spaces (although it’s valid to just quote every character). Most resources on this seem to get that fact wrong and confuse cmd with the utilities that ship with it (and also with special syntax like for and if), leading to a confusion of inaccurate claims. To quote correctly, one merely needs to identify the steps in the list that are relevant and perform the inverse list of operations in reverse. That list is actually unexpectedly short—it just needs to prefix a couple characters with ^.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is also useful
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

Read the conclusions at the end:

Do:

  1. Always escape all arguments so that they will be decoded properly by CommandLineToArgvW, perhaps using my ArgvQuote function above.
  1. After step 1, then if and only if the command line produced will be interpreted by cmd, prefix each shell metacharacter (or each character) with a ^ character.

Do not:

  1. Simply add quotes around command line argument arguments without any further processing.
  2. Allow cmd to ever see an unescaped " character.

Effectively escape_winsomely (I also hate this name but w/e) is step 1 of Raymond Chen's "Do" . Jameson's CMD escaping function does step 2. And as Jameson mentioned we could in principle just escape everything (although overkill) .

I fail to see why his implementation is incorrect or why the general idea here is incorrect . We are not trying to duplicate how CMD.exe parses the command line. We are, however, trying to send CMD.exe a valid escaped string from Julia.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at an earlier PR #11478 it seems like the approach there is incorrect.

@musm
Copy link
Contributor Author

musm commented Oct 4, 2019

@vtjnash First, thank you for spending time to look at this.

For now I decided to add the print_shell_escaped_winsomely function. (I have yet to review your batchly and cmdly functions).

This function behaves the same as your implementation, just it's a lot faster to do the replace manually (roughly 5x faster).

Do you think it would be worth it to get the PR merged in its current state and then continue onwards with your cmdly and batchly functions?

base/shell.jl Outdated Show resolved Hide resolved
@musm musm changed the title Implement inverse of CommandLineToArgvW plus cmd escaping Add winsomely that converts args to a command line Oct 4, 2019
Converts the collection of strings 'args' into a Windows command line.

Co-authored-by:    Mustafa M. <mus-m@outlook.com>
Co-authored-by:    Markus Kuhn <mgk25@cl.cam.ac.uk>
Co-authored-by:    Jameson Nash <vtjnash+github@gmail.com>
@ViralBShah ViralBShah added the system:windows Affects only Windows label Oct 4, 2019
@musm
Copy link
Contributor Author

musm commented Oct 4, 2019

win64 failure unrelated. This is ready on my end.

@vtjnash
Copy link
Member

vtjnash commented Oct 5, 2019

Great! I suggest we give it a day or two in case anyone else wants to review and/or comment, then merge.

@musm
Copy link
Contributor Author

musm commented Oct 5, 2019

Thanks! I want to point out that this was a team effort, combining the work of @vtjnash and @mgkuhn so huge thanks for the effort.

@mgkuhn
Copy link
Contributor

mgkuhn commented Oct 5, 2019

print_shell_escaped_winsomely ... I feel strangely uncomfortable about adverbs in function names.

@musm
Copy link
Contributor Author

musm commented Oct 5, 2019

I think we're kinda stuck with the adverb in the name because we already use posixly. Maybe _winly? I personally don't care what the name is.

@musm
Copy link
Contributor Author

musm commented Oct 7, 2019

Pinging @stevengj since he may be interested in this PR and https://github.com/JuliaLang/julia/pull/33474/files , since has worked on this topic.

base/shell.jl Show resolved Hide resolved
@musm
Copy link
Contributor Author

musm commented Oct 11, 2019

How do we feel with PR now that it has had week to settle? Is this + plus REPL shell mode an interest to include in julia?

@StefanKarpinski
Copy link
Member

@vtjnash, I've assigned you as you've already done some review and you know this area well.

@vtjnash
Copy link
Member

vtjnash commented Oct 11, 2019

I already approved it. It was just waiting in case someone wanted to comment or change the name.

@musm
Copy link
Contributor Author

musm commented Oct 11, 2019

Let's bike shed: how do we fill about escape_shell_WINly or escape_shell_winly or keep it winsomeley ?

@StefanKarpinski
Copy link
Member

I have to confess, the "winsomely" part confused me at first, but it's a cute name, so 🤷‍♂

@musm
Copy link
Contributor Author

musm commented Oct 11, 2019

Ok I'm fine as with it too, so should we merge this then? It's been opened for a week without additional comments. I say let's do it.

@StefanKarpinski StefanKarpinski merged commit f0ab5bb into JuliaLang:master Oct 11, 2019
@musm musm deleted the winshellescape branch October 25, 2019 03:42
# Quote any arg that contains a whitespace (' ' or '\t') or a double quote mark '"'.
# It's also valid to quote an arg with just a whitespace,
# but the following may be 'safer', and both implementations are valid anyways.
quotes = any(c -> c in (' ', '\t', '"'), arg) || isempty(arg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having more time to think about this. I think we should remove " from this list.

Unless we're told otherwise, don't quote unless we actually need to do so - hopefully avoid problems if programs won't parse quotes properly

we should also add a ; force = false option that will indeed quote everything if needed.

@mgkuhn
Copy link
Contributor

mgkuhn commented Dec 14, 2019

Note that Base.shell_escape_winsomely() also inverts the escaping convention of Julia's raw string literals, or in fact any string literals passed to Julia *_str() macros (as you can see from the fact that the raw_str macro is just the identity function). Julia's raw/macro string literals just happen to use now the exact same quote+backslash escaping convention as Microsoft's CommandLineToArgvW function, which Microsoft's C compiler runtime uses to generate main() parameter argv[].

Therefore, shouldn't the loop content of Base.shell_escape_winsomely() instead be called Base.escape_raw_string(), in analogy to Base.escape_string(), and moved next to that into base/strings/io.jl?

In particular, Base.shell_escape_winsomely() has nothing to do with any kind of shell! It is not an analog of Base.shell_escape_posixly(). It does not invert anything implemented in CMD.EXE. So the name still seems a bit wrong to me.

@vtjnash
Copy link
Member

vtjnash commented Dec 14, 2019

Good point. Would you care to make a PR to move the code there and just keep the name here as an alias? That'd help fix #29580.

(Awkwardly, CommandLineToArgvW doesn't actually implement the algorithm that's documented there in its documentation, nor does the entirely separate function which Microsoft's C compiler runtime uses to parse arguments—but that's a story for another day. The escaping done here is still correct.)

@mgkuhn
Copy link
Contributor

mgkuhn commented Dec 15, 2019

Would you care to make a PR to move the code there and just keep the name here as an alias?
@vtjnash #34111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:windows Affects only Windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants