Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new builtin fetch #1738

Closed
wants to merge 4 commits into from
Closed

Add new builtin fetch #1738

wants to merge 4 commits into from

Conversation

bbbco
Copy link

@bbbco bbbco commented Oct 10, 2018

I added a new builtin called fetch, which allows you to specify a key and returns the value or nothing (or another default you define). This is similar to the Ruby #fetch method on Hash-like objects.

           jq ´fetch(.foo)´
              {"foo": "bar"}
           => "bar"

           jq ´fetch(.fu)´
              {"foo": "bar"}
           =>

           jq ´fetch(.fu; "baz")´
              {"foo": "bar"}
           => "baz"

I've added tests and docs for this new builtin, and running make check passes.

@coveralls
Copy link

coveralls commented Oct 10, 2018

Coverage Status

Coverage decreased (-0.1%) to 84.436% when pulling a2d3094 on bbbco:fetch_builtin into 341a5fc on stedolan:master.

@pkoppstein
Copy link
Contributor

pkoppstein commented Oct 11, 2018

Most people, I think, would agree that

{"a": false} | fetch(.a)

should emit false rather than the empty stream.

Perhaps the following achieves the semantics you have in mind:

    def fetch(key):
      del(key) as $o
      | if $o == . then empty else key end;

However this is evidently very inefficient, so perhaps a C implementation is warranted. It would probably be wise, though, to agree on the semantics before getting too deep into implementation details. For example, what should fetch(.a.b) do?

@bbbco
Copy link
Author

bbbco commented Oct 11, 2018

Ah, yes. Great edge cases! I think if we're following the pattern of the Ruby implemention, we should pass falsey and null values through if they are being set on the key value (as we can see happening in Ruby here http://rubyfiddle.com/riddles/71c57/2 ).

Also, to answer your other question about deeper nesting of the key that is being fetched, maybe we should take cue from Ruby and require a string of just a single key and use has() instead of ? to see if it's available and pass it through.

For example:

{"foo": "bar"} | fetch("foo", "baz") => bar
{"foo": "bar"} | fetch("notfoo", "baz") => baz

@bbbco
Copy link
Author

bbbco commented Oct 11, 2018

What about logic (untested, just off the top of my head) like:

def fetch(key; default):
    if has(key) then .[key] else default end

@pkoppstein
Copy link
Contributor

pkoppstein commented Oct 11, 2018

@bbbco - For fetch(_), it would make sense to allow _ to be a jq path expression (like .a.b), or an array path expression (as in getpath(_)), or maybe either. Here is one possibility that allows _ to be any jq path:

def fetch(key; default):
  ({} | path(key)) as $k
  | if any(path(..); . == $k) then key else default end;

def fetch(key): fetch(key; empty);

Example:

{a:0, b: {c:1, d:2}} | fetch(.b.c)

@bbbco
Copy link
Author

bbbco commented Oct 11, 2018

Thanks for the suggestion @pkoppstein ! I've incorporated your code, slightly tweaked so that the default can also refer to a path expression as well. I've also added a few more tests to the tests and docs.

@pkoppstein
Copy link
Contributor

.|default is the same as default. Remove the .|.

@bbbco
Copy link
Author

bbbco commented Oct 11, 2018

Huh. You're right. I could have sworn it wasn't working right without that.

Anyways, I've updated, added a test to ensure it passes through null values, and squashed the commits into one tidy one.

@pkoppstein
Copy link
Contributor

pkoppstein commented Oct 12, 2018

The def using path(..) was only meant to illustrate one possible semantics. As an implementation, it's ridiculously inefficient.

Here's one way to have our cake and eat it:

# $key must be an array-path, as with getpath/1
def fetchpath($key; default):
  ($key|length) as $len
  | if $len > 0
    then $key[0] as $k
    | if has($k)? // false
      then . as $in
      | .[$k] | (if $len == 1 then . else fetchpath($key[1:]; $in|default) end)
      else default
      end
    else default
    end;

def fetch(key; default):
  fetchpath({} | path(key); default);

def fetchpath(key): fetchpath(key; empty);

def fetch(key): fetch(key; empty);

@bbbco
Copy link
Author

bbbco commented Oct 12, 2018

The problem with this approach can be seen in the following examples:

╰─ ./jq '{"foo": {"bar": 2}, "fu": "baz"} | fetch(.foo.bar; .fu)' -n
2                                                                                                           # This is good
╰─ ./jq '{"foo": {"bar": 2}, "fu": "baz"} | fetch(.foo.bar2; .fu)' -n
null                                                                                                       # This is not
╰─ ./jq '{"foo": "foo", "fu": "baz"} | fetch(.foo.bar; .fu)' -n                       # Neither is this (returns empty)

and

╰─ ./jq '{"foo": {"bar": 2}, "fu": "baz"} | fetch(.fu2; .foo)' -n                        # This is good
{
  "bar": 2
}
╰─ ./jq '{"foo": {"bar": 2}, "fu": "baz"} | fetch(.fu2; .foo.bar)' -n                  # So is this
2
╰─ ./jq '{"foo": {"bar": 2}, "fu": "baz"} | fetch(.foo.bar2; .foo.bar)' -n          # This is not
null
╰─ ./jq '{"foo": {"bar": 2}, "fu": "baz"} | fetch(.foo.bar2; .bar)' -n                # Neither is this
2

@pkoppstein
Copy link
Contributor

pkoppstein commented Oct 13, 2018

Revised, but probably still needs more work.

@bbbco
Copy link
Author

bbbco commented Oct 14, 2018

Thanks @pkoppstein . I can confirm that this works as expected now. Also, I can confirm that this algorithm is more performant. Here are the performance benchmarks I ran based on the citylots.json file from https://github.com/tidwall/sf-city-lots-json .

Baseline:

╰─ time cat citylots.json | ./jq -cM '.features[].properties.LOT_NUM' >/dev/null 
cat citylots.json  0.01s user 0.28s system 5% cpu 5.485 total
./jq -cM '.features[].properties.LOT_NUM' > /dev/null  5.28s user 0.59s system 96% cpu 6.078 total

First algorithm from #1738 (comment)

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM)' >/dev/null
cat citylots.json  0.00s user 0.30s system 5% cpu 5.459 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM)' > /dev/null  8.27s user 0.51s system 97% cpu 8.998 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' >/dev/null
cat citylots.json  0.01s user 0.29s system 5% cpu 5.439 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' > /dev/null  24.38s user 0.56s system 98% cpu 25.396 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM2)' >/dev/null
cat citylots.json  0.02s user 0.28s system 5% cpu 5.455 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM2)' > /dev/null  25.01s user 0.52s system 98% cpu 25.955 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >/dev/null
cat citylots.json  0.00s user 0.29s system 5% cpu 5.506 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >   24.52s user 0.61s system 98% cpu 25.517 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >/dev/null
cat citylots.json  0.01s user 0.29s system 5% cpu 5.519 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >   24.38s user 0.59s system 98% cpu 25.364 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; fetch(.geometry2.type; .geometry.type))' >/dev/null 
cat citylots.json  0.00s user 0.30s system 5% cpu 5.531 total
./jq -cM  > /dev/null  44.50s user 0.50s system 98% cpu 45.735 total

Second algorithm from #1738 (comment)

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM)' >/dev/null
cat citylots.json  0.01s user 0.22s system 3% cpu 5.711 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM)' > /dev/null  6.76s user 0.54s system 95% cpu 7.677 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' >/dev/null
cat citylots.json  0.00s user 0.22s system 3% cpu 5.741 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' > /dev/null  6.59s user 0.60s system 95% cpu 7.558 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM2)' >/dev/null
cat citylots.json  0.00s user 0.22s system 3% cpu 5.774 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM2)' > /dev/null  5.83s user 0.66s system 95% cpu 6.828 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >/dev/null
cat citylots.json  0.00s user 0.30s system 5% cpu 5.527 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >   6.06s user 0.47s system 96% cpu 6.747 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >/dev/null
cat citylots.json  0.00s user 0.29s system 5% cpu 5.534 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >   5.84s user 0.58s system 96% cpu 6.651 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; fetch(.geometry2.type; .geometry.type))' >/dev/null
cat citylots.json  0.01s user 0.29s system 5% cpu 5.462 total
./jq -cM  > /dev/null  6.50s user 0.53s system 97% cpu 7.231 total

Seems like your latest iteration is certainly better performant.

Updated algorithm incoming along with extra tests.

@pkoppstein
Copy link
Contributor

pkoppstein commented Oct 14, 2018

[EDIT: The following implementation has been revised.]

The following version of fetchpath/2 should be much faster. Maybe you'd like to test and benchmark it, though as I originally said, this kind of low-level functionality probably should be implemented in C.

# $key must be an array-path, as with getpath/1
def fetchpath($key; default):
    getpath($key) as $value
  | if $value != null then $value
    elif ($key|length) == 1
    then if has($key[-1]) then null else default end
    else getpath($key[:-1]) as $x
    | if $x == null then default
      elif $x|has($key[-1]) then null
      else default
      end
   end;
 

@bbbco
Copy link
Author

bbbco commented Oct 14, 2018

This version does not pass through null values:

╰─ ./jq '{"foo": null} | fetch(.foo)' -n

(returns empty)

Here's the performance benchmarks:

╰─ time cat citylots.json | ./jq -cM '.features[].properties.LOT_NUM' >/dev/null                                       
cat citylots.json  0.00s user 0.29s system 5% cpu 5.469 total
./jq -cM '.features[].properties.LOT_NUM' > /dev/null  5.32s user 0.55s system 96% cpu 6.082 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM)' >/dev/null
cat citylots.json  0.00s user 0.32s system 5% cpu 5.628 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM)' > /dev/null  5.84s user 0.58s system 96% cpu 6.633 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' >/dev/null
cat citylots.json  0.05s user 0.24s system 5% cpu 5.477 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' > /dev/null  6.21s user 0.52s system 97% cpu 6.930 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM)' >/dev/null
cat citylots.json  0.03s user 0.27s system 5% cpu 5.434 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM)' > /dev/null  5.97s user 0.58s system 97% cpu 6.742 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >/dev/null
cat citylots.json  0.00s user 0.29s system 5% cpu 5.400 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >   6.30s user 0.51s system 97% cpu 6.964 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >/dev/null
cat citylots.json  0.02s user 0.28s system 5% cpu 5.412 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >   6.12s user 0.53s system 97% cpu 6.841 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; fetch(.geometry2.type; .geometry.type))' >/dev/null
cat citylots.json  0.01s user 0.17s system 3% cpu 5.686 total
./jq -cM  > /dev/null  7.11s user 0.64s system 95% cpu 8.100 total

Hmm... so seems like this implementation is slightly better when run fetch/1, but slightly worse with fetch/2.

I'm not as familiar with working with lower level C. Do you have suggestions for a lower level C implementation? Or are you going to be ok with accepting one of these implementations for the time being?

@pkoppstein
Copy link
Contributor

pkoppstein commented Oct 14, 2018

This version does not pass through null values:

Yes, the bug has been fixed.

Or are you going to be ok with accepting one of these implementations for the time being?

I am not a Decider. Also, please be aware that the current Deciders do not want to add new defs to builtin.jq until a major overhaul of the mechanism for handling such defs has been completed (which to judge from recent lack of progress, may be quite a while).

However, working on a "library" function is, in my opinion, worthwhile, both for those who may want to use it, and also to get a better handle on semantic issues.

@bbbco
Copy link
Author

bbbco commented Oct 14, 2018

Thanks for fixing the bug. Looks good as far as tests passing now.

Here's the updated benchmarks:

╰─ time cat citylots.json | ./jq -cM '.features[].properties.LOT_NUM' >/dev/null
cat citylots.json  0.00s user 0.33s system 5% cpu 5.516 total
./jq -cM '.features[].properties.LOT_NUM' > /dev/null  5.24s user 0.63s system 95% cpu 6.126 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM)' >/dev/null
cat citylots.json  0.00s user 0.27s system 4% cpu 5.474 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM)' > /dev/null  5.78s user 0.53s system 96% cpu 6.548 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' >/dev/null
cat citylots.json  0.00s user 0.26s system 4% cpu 5.518 total
./jq -cM '.features[] | fetch(.properties.LOT_NUM2)' > /dev/null  6.17s user 0.54s system 95% cpu 6.999 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM)' >/dev/null
cat citylots.json  0.03s user 0.23s system 4% cpu 5.562 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM)' > /dev/null  6.02s user 0.56s system 95% cpu 6.866 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >/dev/null
cat citylots.json  0.00s user 0.26s system 4% cpu 5.599 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry.type)' >   6.16s user 0.60s system 95% cpu 7.109 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >/dev/null 
cat citylots.json  0.00s user 0.26s system 4% cpu 5.629 total
./jq -cM '.features[] | fetch(.properties2.LOT_NUM; .geometry2.type)' >   6.20s user 0.53s system 95% cpu 7.063 total

╰─ time cat citylots.json | ./jq -cM '.features[] | fetch(.properties2.LOT_NUM; fetch(.geometry2.type; .geometry.type))' >/dev/null 
cat citylots.json  0.00s user 0.27s system 4% cpu 5.505 total
./jq -cM  > /dev/null  7.04s user 0.63s system 96% cpu 7.966 total

alex-ozdemir and others added 2 commits December 3, 2018 18:03
We had config machinery that determined which math functions are
available in libc. If a c math function was missing on the host system,
then the corresponding jq function would be removed from the source,
enabling the build to proceed anyway. The detection machinery was broken
in a subtle way, as was shown after glibc updated to 2.27, dropping the
`pow10` function. This caused compilation to fail.

The essential problem was that we detected whether a math function was
available by compiling and linking a small program evaluating that
function on constants. However, since gcc's optimization machinery has
special knowledge of some math functions (e.g. `pow10`), it can
optimize them away, even if they don't exist in the library and are not
linkable. That is, the following example compiles and links against
glibc 2.27, even though `pow10` has been removed:
```
int main () {
  printf("%f", pow10(0.5));
  return 0;
}
```

What?!
On the other hand, this program does not link:
```

int main () {
  double f;
  printf("%f", &f);
  printf("%f", pow10(f));
  return 0;
}
```

In the first program the call to `pow10` can be optimized away as a
constant expression. This requires GCC to know about `pow10` (which it
does!), but it does not require `pow10` to be in the library (and
actually linkable).

The solution is to use autoconf's machinery for detecting function
presence, instead of our own (buggy) machinery. This has the added
benefit of simplifying the code.

The bug was reported in issue jqlang#1659
@bbbco
Copy link
Author

bbbco commented Oct 11, 2023

Closing request. Seems that @pkoppstein decided to implement as pick() in 1.7 on their own. I'm glad that in some way I helped inspire this type of change. My implementation that I worked on with @pkoppstein five years ago was probably too heavy handed anyways.

@bbbco bbbco closed this Oct 11, 2023
@bbbco bbbco deleted the fetch_builtin branch October 11, 2023 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants