Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce and foreach with backtracking init have observable DUPN #3227

Open
kanwren opened this issue Jan 5, 2025 · 0 comments
Open

reduce and foreach with backtracking init have observable DUPN #3227

kanwren opened this issue Jan 5, 2025 · 0 comments

Comments

@kanwren
Copy link

kanwren commented Jan 5, 2025

The fix for #618 (7243989) introduced DUPN to the compilation of reduce f as $x (init; update) and foreach f as $x (init; update; extract) to address extra references being held.

However, the value that DUPN dupes (that is, . as it is passed to f) comes from before a forkpoint, which makes DUPN unsafe here; if we backtrack to the FORK (in particular, if init backtracks), then f will see . mutated to null. This is a problem, as to my understanding the mutation in DUPN (and the other -N instructions) should be non-observable optimizations.

To reproduce

$ jq --null-input  '[] | reduce .[] as $x (0, 0; .)'
0
jq: error (at <unknown>): Cannot iterate over null (null)

$ jq --null-input '[] | foreach .[] as $x (0, 0; .)'
jq: error (at <unknown>): Cannot iterate over null (null)

The init contains a ,, so .[] gets [] on the first pass, but null when backtracking.

Expected behavior

reduce and foreach should not break backtracks that run back through the body. That is to say, we should see:

$ jq --null-input '[] | reduce .[] as $x (0, 0; .)'
0
0

$ jq --null-input '[] | foreach .[] as $x (0, 0; .)'
<empty>

1

Put another way, it should be the same as e.g. [] | . as $dot | reduce ($dot | .[]) as $x (0, 0; .), which is a workaround today.

Environment

  • OS: Linux (NixOS 25.05)
  • jq version: tested against current master (588ff18) and 1.7.1

Additional context

I noticed this issue while inspecting the bytecode, which gives some insights towards addressing it, which I've noted below.

From digging in thread history, it seems that @leonid-s-usov noticed the same issues with the bytecode while looking for a way to remove the DUPN instruction for another reason. They give a draft of potential changes in #2059, which from my testing would be sufficient to fix this bug.

foreach

This shouldn't be an issue with foreach in the first place, and should be fixable without having to compromise. Inspecting the bytecode for the example above:

// input and init
0000 TOP
0001 LOADK []
0003 DUP
0004 FORK 0010
0006 LOADK 0
0008 JUMP 0012
0010 LOADK 0

// foreach
0012 STOREV $foreach:0
0015 FORK 0031
0017 DUPN
0018 EACH
0019 STOREV $x:1
0022 LOADVN $foreach:0
0025 DUP
0026 STOREV $foreach:0
0029 JUMP 0032
0031 BACKTRACK

0032 RET

Notice that the FORK 0031 backtracks directly to a BACKTRACK, which is unnecessary, as backtracks in the loop body will already backtrack properly, and we don't need to consume the stream like we do in reduce. I suspect that this is only here because 5a863bf copied gen_reduce and modified it a bit.

If the FORK ... BACKTRACK (and JUMP) structure is removed, then DUPN isn't necessary anymore; if we don't need to backtrack, then . isn't held across a forkpoint and stack_pop_will_free, meaning no unnecessary references. If we do need to backtrack, then we needed to keep the reference anyway.

reduce

reduce can't avoid the FORK ... BACKTRACK. A STOREV $dot ... FORK ... LOADVN $dot would serve a similar purpose to DUPN for avoiding unnecessary references while not causing backtracking issues; that's the only potential fix I can think of while staying within the constraints of the current instruction set.

Footnotes

  1. This is the output from local testing of a potential fix (removing FORK from foreach, using STOREV/LOADVN instead of DUPN in reduce)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant