Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ER]: capturing the output of a subprocess #101

Closed
pkoppstein opened this issue Jul 19, 2021 · 13 comments
Closed

[ER]: capturing the output of a subprocess #101

pkoppstein opened this issue Jul 19, 2021 · 13 comments

Comments

@pkoppstein
Copy link

The main need is to be able to write a pipeline along the lines of:

"abc" | run("md5")

with the result: "0bee89b07a248e27c83fc3d5951213c1"

Ideally, you could use this in conjunction with try/catch:

"abc" | try system("md5") catch .

This fits in nicely with jq's pipes-and-filters, but of course a
reasonable alternative would be to follow the lead of Go's
exec.Command(), and have the result be a JSON object with various keys
holding the results.

This gojq issue is related to the jq issue at jqlang/jq#147

My impression is that progress on this functionality has stagnated at stedolan/jq
mainly because the issue became entangled with numerous other potential enhancements
(see especially jqlang/jq#1843).

So my suggestion would be to keep gojq's initial support for "shelling out" quite simple.

Thank you again.

@itchyny
Copy link
Owner

itchyny commented Jul 19, 2021

I feel fun about discussing a feature extension like this. And it is very easy to implement a function like this in gojq. Nevertheless I hesitate to implement a new public function, which might be introduced as different name, or same name but different behavior in jq.

@pkoppstein
Copy link
Author

pkoppstein commented Jul 19, 2021

@itchyny - Understood, but please consider:

(a) stedolan/jq development has stalled very badly;
(b) gojq has significant performance issues compared to jq, so it would be nice to give users additional reasons to overlook them;
(c) introducing a new built-in is not such a big deal, partly because jq allows built-ins to be overridden (*);
(d) you could add a "compatibility mode" flag to specify that additional built-ins should not be included;
(e) you could flag new built-ins as "experimental" so that gojq is not committed to their specifications.

The need addressed by this particular proposal (i.e. for "run" however named) is so fundamental that, in my opinion at least, it should take precedence over other concerns, especially as those concerns can be greatly alleviated.

I'm glad to hear that such a big improvement should not take too much of your time :-)


(*) For example, suppose I have a jq program, run.jq, that uses a UDF (user-defined function)run/1 that differs from gojq's built-in run/1. If I use gojq to run run.jq, then there should not be a problem, because the UDF should shadow the built-in. Of course, I might not be able to modify run.jq to take advantage of gojq's own run/1, but then again, it might be possible, because jq and gojq handle redefinitions in the same way:

jq -n 'def builtinLength: length; def length: "hello"; 1 | builtinLength, length'

gojq -n 'def builtinLength: length; def length: "hello"; 1 | builtinLength, length'

@smlx
Copy link
Contributor

smlx commented Jul 19, 2021

Being able to run arbitrary system commands through a jq script also has significant security implications. I'm just a gojq user, but I don't think this is a good idea.

@pkoppstein
Copy link
Author

@smlx - Since my main goal is to have this functionality available as required, I'd have no objection to turning the feature off by default, and requiring a command-line flag to enable it. The flag could be named neutrally (e.g. "--go") or to emphasize the possible risk (e.g. "--at-your-own-risk").

On the other hand, I don't really understand why functionality such as this should be made available to Go (and awk and python and ruby and ....) programmers, but not to jq or gojq programmers. Surely the likes of tio.run, replit.com, glot.io, etc make it clear that this kind of functionality can be made available to programmers of various persuasions without preventing suitably "sandboxed" environments from being created.

@wader
Copy link
Contributor

wader commented Jul 19, 2021

@pkoppstein Could you expand a bit more about how a subprocess API could work, should it be blocking or streaming somehow?

Here is quick stab at a blocking cmd function:

package main

import (
	"bytes"
	"context"
	"encoding/json"
	"fmt"
	"os"
	"os/exec"
	"strings"

	"github.com/itchyny/gojq"
)

func main() {
	query, err := gojq.Parse(os.Args[1])
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}

	code, err := gojq.Compile(query,
		gojq.WithFunction("cmd", 1, 1, func(c interface{}, a []interface{}) interface{} {
			cStr, cStrOk := c.(string)
			if c != nil && !cStrOk {
				return fmt.Errorf("expected null or string input")
			}

			cmdsAny, cmdsAnyOk := a[0].([]interface{})
			if len(a) == 0 {
				return fmt.Errorf("expected non-empty array argument")
			}
			if !cmdsAnyOk {
				return fmt.Errorf("expected array argument")
			}
			var cmds []string
			for _, sa := range cmdsAny {
				s, sOk := sa.(string)
				if !sOk {
					return fmt.Errorf("expected string elements, got %v", sa)
				}
				cmds = append(cmds, s)
			}

			stdoutBuf := &bytes.Buffer{}
			cmd := exec.Command(cmds[0], cmds[1:]...)
			cmd.Stdin = strings.NewReader(cStr)
			cmd.Stdout = stdoutBuf
			if err := cmd.Run(); err != nil {
				return err
			}

			return stdoutBuf.String()
		}),
	)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}

	iter := code.RunWithContext(context.Background(), nil)
	for {
		var ok bool
		var v interface{}

		if v, ok = iter.Next(); !ok {
			break
		}
		if err, ok = v.(error); ok {
			fmt.Fprintln(os.Stderr, err)
			break
		}

		e := json.NewEncoder(os.Stdout)
		e.SetIndent("", "    ")
		_ = e.Encode(v)
	}
}
$ go run main.go '"hello" | cmd(["md5"])'
"5d41402abc4b2a76b9719d911017c592\n"

@pkoppstein also i've experimented some with various jq extension using gojq. For example non-base-10 number literals , bitwise operators, some introspection support, a "JQValue" go interface to implement own types and some IO operations. It's for a private project that i hope will be open source soon but let me know if you want more information.

@pkoppstein
Copy link
Author

@wader - Regarding your first question -- as my initial post in this thread indicated, I would recommend the simplest approach that meets the very basic requirements as already outlined. I'm therefore curious about whether your "quick stab" interfaces with gojq's try ... catch mechanism. Since I'm no Go guru, I won't have much to say about implementation details, but thanks for asking.

Regarding your last paragraph - do you expect your extensions will make their way into the "official" gojq? If I can help in some way other than Go coding, please let me know.

@wader
Copy link
Contributor

wader commented Jul 20, 2021

@wader - Regarding your first question -- as my initial post in this thread indicated, I would recommend the simplest approach that meets the very basic requirements as already outlined. I'm therefore curious about whether your "quick stab" interfaces with gojq's try ... catch mechanism. Since I'm no Go guru, I won't have much to say about implementation details, but thanks for asking.

Yes with WithFunction anything returned that comply to the standard go error interface will be try/catch-able, so it works because fmt.Errorf and cmd.Run returns those. gojq also has a ValueError interface if you want the catched value to be something else than the error string.

Regarding your last paragraph - do you expect your extensions will make their way into the "official" gojq? If I can help in some way other than Go coding, please let me know.

No not expecting that but i'm very much willing to help adding it to gojq if @itchyny agrees. I've briefly mentioned some of them before but as I fully understand he is reluctant to add them. But your comment about that jq has stagnated in development made me rethink that maybe gojq would be a good place for optional experimental features? i guess a difficulty would be how to make it not a huge maintenance burden, also should the features be enabled one by one? all or nothing? compile time or runtime optional?

I think you could help a lot and as you seem to know the jq language very well i think you a big help could be to make sure a new feature is consistent with the existing language, standard library, etc. Also of course good tests and documentation is lot of work also.

@pkoppstein
Copy link
Author

@wader wrote -

if @itchyny agrees

Hopefully @itchyny can be persuaded that adding new built-ins can be done in a way that is both advantageous and basically risk-free.

good tests and documentation ...

As it happens, I've just finished writing a script that can be used both for regression testing (e.g. of jq or gojq separately) and for consistency testing, e.g. between jq and gojq (the point being that there are certain well-known differences between jq and gojq that the testing framework should know about). The script (run-jq-tests) can take as input a file in the existing "jq.test" format, but it also adds some useful extensions. It should be easy for someone to convert the gojq .yaml testing files to the jq.test format ....

@wader
Copy link
Contributor

wader commented Jul 23, 2021

Hopefully @itchyny can be persuaded that adding new built-ins can be done in a way that is both advantageous and basically risk-free.

Have any ideas how a jq script would signal that it uses gojq-specific things? just a command line argument, some syntax or always enabled but namespaced somehow?

I think the most tricky additions would be language changes like new syntax, example my non-base-10 literals, as that makes changes to the lexer/parser code.

As it happens, I've just finished writing a script that can be used both for regression testing (e.g. of jq or gojq separately) and for consistency testing, e.g. between jq and gojq (the point being that there are certain well-known differences between jq and gojq that the testing framework should know about). The script (run-jq-tests) can take as input a file in the existing "jq.test" format, but it also adds some useful extensions. It should be easy for someone to convert the gojq .yaml testing files to the jq.test format ....

Ah that is really nice. Plan on publish it somewhere eventually?

Sorry for a bit short and slow reply, on vacation :)

@pkoppstein
Copy link
Author

@wader - Since gojq is @itchyny's project, I would be reluctant
to suggest any "non-standard" changes besides new built-ins,
by which I mean to include both jq- and Go-coded filters.

Obviously the module system could be used for all new built-ins, but
my personal preference would be that some (e.g. "run" :-) would be
added without being consigned to a module. Having a "strict-mode"
command-line flag might alleviate @itchyny's concerns about
compatibility, while hopefully being easy to implement.

This does raise an important point, though, especially because jq does
make it easy to use "system-defined" modules (i.e. without any
command-line flag being needed). However, that topic is perhaps left
to a different thread.

Assuming other types of "enhancements" belong in an "experimental"
version of gojq, it might be more trouble than it's worth to maintain
a compatibility mode, but that would obviously be up to the people
developing such a version.

Regarding the testing script -- I'm incorporating some
changes that will hopefully make it easy to use with
gojq's .yaml test files as well as jq's .test files.
If you'd like to have a preview, let me know a convenient
way to make it available to you -- Dropbox? gist? email?

@u0nel
Copy link

u0nel commented Oct 23, 2021

@itchyny
(b) gojq has significant performance issues compared to jq

I timed the last example query on jq's tutorial, which returned these vaues:

jq   0,06s user 0,00s system 22% cpu 0,276 total
gojq   0,00s user 0,00s system 3% cpu 0,116 total

@lgfausak
Copy link

lgfausak commented Nov 5, 2021

Thank you for the great work! Gojq is already a superset of jq. I've been flipping back and forth on the command line no problems. If I understand @wader 's example for cmd I can compile in a new module which I have access to as a function from jq. Perfect. I think it would be useful to build in some modules like this that are maybe enabled with --experimental . I would cast a vote for this function.

@itchyny
Copy link
Owner

itchyny commented Apr 23, 2022

Due to the maintenance difficulty and security consideration, I would not implement this feature in this repository. Thank you all for valuable comments. Feel free to implement in your fork at your own risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants