Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced any and all. #1839

Closed
wants to merge 1 commit into from
Closed

Enhanced any and all. #1839

wants to merge 1 commit into from

Conversation

fadado
Copy link

@fadado fadado commented Feb 23, 2019

After the last changes in builtin.jq my old pull requests cannot merge automatically. I hope
my enhanced definitions for any and all in this new pull request can be merged soon. The new definitions are:

  • faster
  • a lot more jq idiomatic
  • a lot shorter
  • a god example of programming with empty

JJOR

@coveralls
Copy link

Coverage Status

Coverage remained the same at 84.602% when pulling 7137fc4 on fadado:master into 9a0d5be on stedolan:master.

@muhmuhten
Copy link
Contributor

muhmuhten commented Feb 23, 2019

how does this look

diff --git a/src/builtin.jq b/src/builtin.jq
index d3a4bb1..6105582 100644
--- a/src/builtin.jq
+++ b/src/builtin.jq
@@ -32,20 +32,6 @@ def index($i):   indices($i) | .[0];       # TODO: optimize
 def rindex($i):  indices($i) | .[-1:][0];  # TODO: optimize
 def paths: path(recurse(if (type|. == "array" or . == "object") then .[] else empty end))|select(length > 0);
 def paths(node_filter): . as $dot|paths|select(. as $p|$dot|getpath($p)|node_filter);
-def any(generator; condition):
-        [label $out | foreach generator as $i
-                 (false;
-                  if . then break $out elif $i | condition then true else . end;
-                  if . then . else empty end)] | length == 1;
-def any(condition): any(.[]; condition);
-def any: any(.);
-def all(generator; condition):
-        [label $out | foreach generator as $i
-                 (true;
-                  if .|not then break $out elif $i | condition then . else false end;
-                  if .|not then . else empty end)] | length == 0;
-def all(condition): all(.[]; condition);
-def all: all(.);
 def isfinite: type == "number" and (isinfinite | not);
 def arrays: select(type == "array");
 def objects: select(type == "object");
@@ -172,6 +158,12 @@ def limit($n; exp):
     elif $n == 0 then empty
     else exp end;
 def isempty(g): 0 == ((label $go | g | (1, break $go)) // 0);
+def all(generator; condition): isempty(generator|condition and empty);
+def all(condition): all(.[]; condition);
+def all: all(.[]; .);
+def any(generator; condition): all(generator; condition|not)|not;
+def any(condition): all(.[]; condition|not)|not;
+def any: all(.[]; not)|not;
 def first(g): label $out | g | ., break $out;
 def last(g): reduce g as $item (null; $item);
 def nth($n; g): if $n < 0 then error("nth doesn't support negative indices") else last(limit($n + 1; g)) end;

Since all/2 is the most natural to express, I defined that first, and defined all the other variants directly in terms of all/2; even the anys are defined as the logical corollaries. This localizes as much how-jq-works knowledge in all/2 as possible, while the others can be verified as obviously correct by inspection.

edit: clearly not obviously enough.

@muhmuhten
Copy link
Contributor

It should also be correct to define def isempty(g): (first(g|1) // 0) == 0; (moving it below first/1).

@fadado
Copy link
Author

fadado commented Feb 23, 2019

How much efficient is | not in jq?
If this cost is negligible your code is better in terms of maintenance efforts.

@nicowilliams
Copy link
Contributor

nicowilliams commented Feb 23, 2019

More idiomatic isempty/1:

def isempty(g): first((g|false), true);

But slower, because first/1 won't be getting inlined... because there's no inlining in jq :(

@muhmuhten
Copy link
Contributor

... also, the current any/2 and all/2 don't short-circuit correctly, they evaluate one more argument than necessary, as I found trying to add a test

# Check short-circuiting
any(true, error; .)
"badness"
true

all(false, error; .)
"badness"
false

@muhmuhten
Copy link
Contributor

(and should null|error really silently fail to error??)

@nicowilliams
Copy link
Contributor

nicowilliams commented Feb 24, 2019

null | error... That's funny. I believe it's my fault, from 8ea21a5:

    } else if (jv_get_kind(error_message) == JV_KIND_NULL) {
      // Halt with no output

that's in src/main.c, in process().

Since this is not documented or needed, it's clearly a bug, and we should fix it.

EDIT: It'd be good if I could remember why I added that case...

@muhmuhten
Copy link
Contributor

The issue goes deeper than that--jq doesn't halt on null|error, it just drops the value and keeps going:

% ./jq -cn '[1,error,2]'
[1,2]

@nicowilliams
Copy link
Contributor

@muhmuhten Ahh, because I made it so when a C-coded builtin returns a jv_invalid() (i.e., w/ a null "message"), this triggers backtracking in the VM:

     837     case CALL_BUILTIN: {
     838       int nargs = *pc++;
     839       jv top = stack_pop(jq);
     840       jv* in = cfunc_input;
     841       in[0] = top;
     842       for (int i = 1; i < nargs; i++) {
     843         in[i] = stack_pop(jq);
     844       }
     845       struct cfunction* function = &frame_current(jq)->bc->globals->cfunctions[*pc++];
     846       typedef jv (*func_1)(jq_state*,jv);
     847       typedef jv (*func_2)(jq_state*,jv,jv);
     848       typedef jv (*func_3)(jq_state*,jv,jv,jv);
     849       typedef jv (*func_4)(jq_state*,jv,jv,jv,jv);
     850       typedef jv (*func_5)(jq_state*,jv,jv,jv,jv,jv);
     851       switch (function->nargs) {
     852       case 1: top = ((func_1)function->fptr)(jq, in[0]); break;
     853       case 2: top = ((func_2)function->fptr)(jq, in[0], in[1]); break;
     854       case 3: top = ((func_3)function->fptr)(jq, in[0], in[1], in[2]); break;
     855       case 4: top = ((func_4)function->fptr)(jq, in[0], in[1], in[2], in[3]); break;
     856       case 5: top = ((func_5)function->fptr)(jq, in[0], in[1], in[2], in[3], in[4]); break;
     857       // FIXME: a) up to 7 arguments (input + 6), b) should assert
     858       // because the compiler should not generate this error.
     859       default: return jv_invalid_with_msg(jv_string("Function takes too many arguments"));
     860       }
     861
     862       if (jv_is_valid(top)) {
     863         stack_push(jq, top);
     864       } else if (jv_invalid_has_msg(jv_copy(top))) {
     865         set_error(jq, top);
     866         goto do_backtrack;
     867       } else {
---->868         // C-coded function returns invalid w/o msg? -> backtrack, as if
---->869         // it had returned `empty`
---->870         goto do_backtrack;
     871       }
     872       break;
     873     }

That's fine, but f_error() then needs to treat null in some special way. It's currently:

static jv f_error(jq_state *jq, jv input) {
  return jv_invalid_with_msg(input);
}

Now, a null error last actually raised an error in jq 1.4, and the output was the same then as error("null"):

$ jq-1.4 -n 'error(.)'
jq: error: null
$ jq-1.4 -n 'error("null")'
jq: error: null

So we could just map null to "null" and get 1.4 behavior back.

Alternatively we could use a new jv [pseudo-]type to indicate "trigger backtracking".

Which is preferable? The latter would allow null | error to do what it used to before 906d253 (i.e., jq 1.4). The former is simpler, yes, but the latter is more elegant. But the latter is more work too.

I'm inclined to map null to "null" for now.

@muhmuhten
Copy link
Contributor

Culprit looks like 906d253.

There are no C-coded builtins that return a jv_invalid() naturally (other than error), though?

@nicowilliams
Copy link
Contributor

nicowilliams commented Feb 24, 2019

@muhmuhten There are no other C-coded builtins that do that. Of the jq-coded builtins, the only one that needs some thought is inputs:

def inputs: try repeat(input) catch if .=="break" then empty else .|error end;

Byte-coded builtins throw no errors.

EDIT: That is, all other C-coded builtins always pass a non-null to jv_invalid_with_msg(), and all the jq-coded builtins other than inputs always pass a non-null to error.

@nicowilliams
Copy link
Contributor

Re: inputs, there is no bug there that can cause error to be called with null. The C-coded input builtin outputs an invalid jv with "break" as a message on EOF or if there is no input callback associated with the VM. It outputs no other errors, so inputs is OK.

@muhmuhten
Copy link
Contributor

Ah, I see the complexity--an invalid without a message is process()'s signal that there's no more output as well, so we can't just pass it up and handle it as an error, because that would make every program look like it finished with a (null) error?

I think having error/0 silently replace an input null with a string violates the assumption that what we get back when catching an error is what went in, and having try null|error catch . be the string "null" seems surprising and probably pretty frustrating to debug, though.

Maybe as a compromise we can having error/0 error on input null with an error along the lines of "error/0 requires a non-null input" and put it on the list of things to fix in ($far_future = never)...

Then remove the backtrack on C returning invalid since apparently it's not needed.

@nicowilliams
Copy link
Contributor

Let's continue this tomorrow or later this week. I have a bunch of branches lying around with various half-baked attempts at adding I/O-related builtins and co-routines, and who knows what else. I suspect I needed this behavior at some point for C-coded builtins, but if nothing is using it, then indeed, we can just remove this and that's the simplest fix. We can always add a "backtrack" jv pseudo-type for this later.

@nicowilliams
Copy link
Contributor

The one C-coded builtin that could benefit from a convention for "hey, just backtrack now" is input, and it has its own error-based convention, so yes, we can just remove this [mis]feature.

I'll make it so.

BTW, I've done a bit of work just now on adding support for shared objects / DLLs in libraries... Not even remotely tested -- what I have builds, but that's it.

@nicowilliams
Copy link
Contributor

#1845 subsumes this PR. Thanks @fadado and @muhmuhten!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants