Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting objects using walk has side effects #2584

Closed
wfaulk opened this issue May 16, 2023 · 5 comments · Fixed by #2795
Closed

Deleting objects using walk has side effects #2584

wfaulk opened this issue May 16, 2023 · 5 comments · Fixed by #2795
Labels
Milestone

Comments

@wfaulk
Copy link

wfaulk commented May 16, 2023

Describe the bug
Deleting objects using walk seems to cause side effects. I'm not sure of the details, so I'll just have to provide an example.

To Reproduce
walk(select(IN({}, []) | not))

input:

{
  "a": 1,
  "b": []
}

Expected behavior

{
  "a": 1
}

Actual behavior

null

Environment (please complete the following information):

  • OS and Version: MacOS 11.7.4
  • jq version 1.6

This seems to be reproducible in every environment I've tried, though.

Additional context
In addition, if the input is the semantically equivalent:

{
  "b": [],
  "a": 1
}

then the output is as-expected.

@emanuele6
Copy link
Member

emanuele6 commented May 16, 2023

I already told you why that is 3 days ago on IRC.
You obviously have read my messages, since you replaced your walk(if ((type=="object" or type=="array") and length==0) then empty else . end) with <emanuele6 > by the way, note that your code could be written as walk(select(IN({}, []) | not)) but walk is buggy, but you never replied. 👎


The walk/1 implementation is not correct, and breaks if f, called on the value of an object, returns zero or multiple values.

Fixing it is as simple as changing the implementation from:

# Apply f to composite entities recursively, and to atoms
def walk(f):
  . as $in
  | if type == "object" then
      reduce keys_unsorted[] as $key
        ( {}; . + { ($key):  ($in[$key] | walk(f)) } ) | f
  elif type == "array" then map( walk(f) ) | f
  else f
  end;

To:

# It would be nice to have max/1, min/1, add/1, etc in jq by the way...
def add(f): reduce f as $i (null; . + $i);

# Apply f to composite entities recursively, and to atoms
def walk(f):
  . as $in
  | if type == "object" then
      reduce keys_unsorted[] as $key
        ({}; add(., ($in[$key] | walk(f) | { ($key): . }))) | f
  elif type == "array" then map( walk(f) ) | f
  else f
  end;

I also told you you can use del/1 instead of walk/1 to delete empty objects and arrays:

del(.. | select(isempty(.. | scalars)))

@pkoppstein
Copy link
Contributor

pkoppstein commented May 16, 2023

A version of walk/1 that resolves this issue may be found at #2611


Whether or not the walk implementation is correct, please note that setting the value of a key to empty has the effect of wiping out the object, as can for example be seen by:

$ jq '.b = empty' <<< '{"a": 1,"b": []}'

Now maybe that's also not "correct" but for now, that's the way it is.

@emanuele6
Copy link
Member

emanuele6 commented May 16, 2023

Whether or not the walk implementation is correct, the point is that setting the value of a key to empty has the effect of wiping out the object, as can for example be seen by:

$ jq '.b = empty' <<< '{"a": 1,"b": []}'

I agree that whether or not my proposed fix for walk/1 is correct is debatable; it should definitely simply remove the key from the input object if f returns no values, but, if f returns multiple values, maybe it should iterate and create multiple outputs for each of the value (thought that is not consistent with the behaviour for arrays), or maybe it should keep the first value instead of the last value as it currently does. add(., walk(f)) preserves the old behaviour for multiple values (see below).

Now maybe that's also not "correct" but for now, that's the way it is.

Anyway, your .b = empty example is wrong. That is definitely correct.

The real reason why that returns nothing is because the rhs of the = operator (a.k.a. _assign/2) is a $ argument, so it is not lazily evaluated, and .b = ??? gets looped for every value returned of the rhs.

You will note that:

def foo($x): "hi";
[foo(1,2,3)], [foo(range(4))], [foo(empty)]

returns

["hi","hi","hi"]
["hi","hi","hi","hi"]
[]

["hi","hi","hi"] because foo(1,2,3) called foo(1) (with 1 as $x), then foo(2) (with 2 as $x), then foo(3) (with 3 as $x).
[] because foo(empty) never called foo because it did not have a value to assign to $x and call foo with.

And that:

{ a: 1, b: [] } |
.b = (6,7)

returns

{"a":1,"b":6}
{"a":1,"b":7}

If you use .b |= empty you will see that |= in which the rhs is lazy and is computed for every path of the lhs, you will see that it is implemented in a way that causes the .b key to be deleted (though note that if you use |= empty on multiple path to elements of the same array, it will not work correctly in the current version of jq).

{ a: 1, b: [] } |
.b |= empty

outputs

{"a":1}

Since the function is not using = anywhere, it does not make much sense to mention it as the cause of the problem.

The reason why the walk/1 breaks if walk(f) returns 0 values is that . + empty, like for a function with two $ arguments implicitly loops every combination of the results of lhs . and rhs empty and returns their sum; since there are no values for the rhs, it cannot return any result (i.e. empty).

That, in combination with the undocumented quirk that reduce abc as $i (.; xyz) will return null if at any iteration xyz returns no values, causes the buggy walk/1 behaviour that we are observing; that is actually wiping all the entries before the one for which the function returned empty, not always returning null:

{ a: 1, b: 2, c: 3 } |
[ walk(select(. != 1)) ],
[ walk(select(. != 2)) ],
[ walk(select(. != 3)) ]
{"b":2,"c":3}
{"c":3}
null

Also note that, in reduce abc as $i (.; xyz), if xyz returns multiple values at one iteration, just like with my add(., walk(f)) fix, only the last value is kept, reduce doesn't iterate the return values and return multiple values as one may expect:

$ # at the $i==5 iteration, xyz returns two arrays instead of one:
$ #  the first is the input array with 5 appended
$ #  the second is the input array with 15 appended
$ # at every other iteration it just appends $i to the input array
$ jq -cn 'reduce range(10) as $i ([]; . + ($i | ., (select(. == 5) + 10) | [.]) | debug)'
["DEBUG:",[0]]
["DEBUG:",[0,1]]
["DEBUG:",[0,1,2]]
["DEBUG:",[0,1,2,3]]
["DEBUG:",[0,1,2,3,4]]
["DEBUG:",[0,1,2,3,4,5]]
["DEBUG:",[0,1,2,3,4,15]]
["DEBUG:",[0,1,2,3,4,15,6]]
["DEBUG:",[0,1,2,3,4,15,6,7]]
["DEBUG:",[0,1,2,3,4,15,6,7,8]]
["DEBUG:",[0,1,2,3,4,15,6,7,8,9]]
[0,1,2,3,4,15,6,7,8,9]
$ # for the $i>=6 iterations it used the last array returned by xyz
$ # instead of maybe continuing to iterate for both values and return
$ # two arrays at the end:
$ #  [0,1,2,3,4,5,6,7,8,9]
$ #  [0,1,2,3,4,15,6,7,8,9]

By the way, the built-in version of walk/1 does suffer from an efficiency defect, that is readily overcome by:

Thanks a lot for pointing that out! I have noticed substantial performance problems in deeply recursive function, and e.g. until/2 that performs many iterations. I figured out that the problem was that jq was creating a copy of all arguments to hold on to at every recursions (even at the end of the function), but I never realised I could partially mitigate that problem by using a local function that doesn't copy constant arguments; very good to know!

@pkoppstein
Copy link
Contributor

@emanuele6 wrote:

Since the function is not using = anywhere, it does not make much sense to mention it as the cause of the problem.

It's true that the implementation of walk doesn't use = but,
as its primary author, I can say that the semantics of P = Q was
on my mind.

In other words, it is important to understand the consistency of jq's
approach to handling expressions with empty, e.g.

    .a = empty
    .[1] = empty
    null + empty
    if empty then 1 else 2 end
    . == empty

and so on.

In fact, all three of the major implementations of jq (the C
implementation, gojq in Go and jaq in Rust) are all agreed about BOTH

    JQ '.a = empty'  <<<  '{"a": 1,"b": 2}'

and

    JQ 'walk(if type == "object" then .a = empty else . end)' <<<  '{"a": 1,"b": 2}'

And similarly for .[$n] = ARRAY

As you observe, '|=' is quite different because it is path-oriented.
Interestingly, early versions of jq were a bit muddled
about |=; for example, in jq 1.3 the result was null, even though
the (false) analogy with .a = (.a | empty) would have suggested the
result should be empty.

@wfaulk
Copy link
Author

wfaulk commented May 18, 2023

I already told you why that is 3 days ago on IRC.

Yeah, you said:

i am looking at the implementation of the walk function, and the way of it is written is wrong, it will break if you delete an entry from an object or return multiple values for the entry of an object

That sounds like you're telling me there's a bug. So I'm reporting it. Sorry I didn't follow up in IRC with you.

pkoppstein added a commit to pkoppstein/jq that referenced this issue Jul 31, 2023
pkoppstein added a commit to pkoppstein/jq that referenced this issue Jul 31, 2023
commit 9afda0a
Merge: 4665e81 eeb08b5
Author: pkoppstein <pkoppstein@gmail.com>
Date:   Mon Jul 31 07:33:58 2023 -0400

    Merge branch 'walk' of https://github.com/pkoppstein/jq into walk

commit 4665e81
Author: pkoppstein <pkoppstein@gmail.com>
Date:   Mon Jul 31 07:26:03 2023 -0400

    revamp walk/1: rebase, add test cases

    Note that according to the new def:

    `{x:0} | walk(.,1)` is equivalent to: {x:0} | walk(.), walk(1)

commit bdec9c0
Author: pkoppstein <pkoppstein@gmail.com>
Date:   Wed Jul 5 02:00:59 2023 -0400

    revamp walk/1

    Resolves jqlang#2584; see also jqlang#2611

commit c8e28da
Author: itchyny <itchyny@cybozu.co.jp>
Date:   Mon Jul 31 09:52:52 2023 +0900

    Redesign website (jqlang#2628)

    * Bump up Bootstrap to v5.3.1, Bootstrap Icon to v1.10.5.
    * Use autoComplete.js to drop dependency on jQuery and typeahead.js.
    * Support dark mode.
    * New svg logo and icon with responsive color mode support.
    * Normalize section ids to lower kebab-case for easiness of linking.
    * Use relative paths for links for local development (--root /output).
    * Various markup cleanups and accessibility improvements.

commit 4af3f99
Author: Owen Ou <o@owenou.com>
Date:   Sat Jul 29 07:20:48 2023 -0700

    Update `bug_report.md` with Discord link

commit 82f7f77
Author: Owen Ou <o@owenou.com>
Date:   Sat Jul 29 07:15:57 2023 -0700

    Redirect questions to Discord

    We now have an official Discord server and most maintainers are hanging
    out there. It would be a good idea to redirect questions to Discord.

commit f733a15
Author: Nicolas Williams <nico@cryptonector.com>
Date:   Mon Jul 10 18:29:03 2023 -0500

    Use -Wno-cast-function-type to quiet many warnings

commit c8b30df
Author: Nicolas Williams <nico@cryptonector.com>
Date:   Mon Jul 10 18:28:33 2023 -0500

    Add JQ_FALLTHROUGH and use it to quiet warnings

commit a6eb055
Author: Emanuele Torre <torreemanuele6@gmail.com>
Date:   Sat Jul 29 21:57:40 2023 +0200

    Fix typo in manual: "-seq" => "--seq"

commit ee2a215
Author: Owen Ou <169064+owenthereal@users.noreply.github.com>
Date:   Sat Jul 29 07:38:08 2023 -0700

    Backfill with references in NEWS.md (jqlang#2788)

    Backfill with references to PRs & issues in NEWS.md

commit eeb08b5
Author: pkoppstein <pkoppstein@gmail.com>
Date:   Wed Jul 5 02:00:59 2023 -0400

    revamp walk/1

    Resolves jqlang#2584; see also jqlang#2611
pkoppstein added a commit to pkoppstein/jq that referenced this issue Jul 31, 2023
Resolves jqlang#2584; also resolves jqlang#2611
and supersedes jqlang#2655

Note that according to the revised implementation:

`{x:0} | walk(.,1)` is equivalent to `{x:0} | walk(.), walk(1)`
@itchyny itchyny added this to the 1.7 release milestone Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants