Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zed shorthand to create a record field name from string #4555

Closed
philrz opened this issue Apr 27, 2023 · 2 comments · Fixed by #4795 or #4802
Closed

Zed shorthand to create a record field name from string #4555

philrz opened this issue Apr 27, 2023 · 2 comments · Fixed by #4795 or #4802
Assignees

Comments

@philrz
Copy link
Contributor

philrz commented Apr 27, 2023

At the time of the filing of this issue, Zed is at commit 1874aeb.

The attached scan.json.gz is a trimmed masscan report of the kind referenced in this tweet. The jq example shown there formats it thusly:

$ jq --version
jq-1.6

$ jq 'reduce .[] as $e ({}; . + { ($e.ip): (.[$e.ip] + $e.ports) })' scan.json 
{
  "192.168.5.53": [
    {
      "port": 515,
      "proto": "tcp",
      "status": "open",
      "reason": "syn-ack",
      "ttl": 64
    },
    {
      "port": 21,
      "proto": "tcp",
      "status": "open",
      "reason": "syn-ack",
      "ttl": 64
    }
  ],
  "192.168.5.49": [
    {
      "port": 49152,
      "proto": "tcp",
      "status": "open",
      "reason": "syn-ack",
      "ttl": 64
    },
    {
      "port": 3401,
      "proto": "tcp",
      "status": "open",
      "reason": "syn-ack",
      "ttl": 64
    }
  ]
}

The cleanest Zed I thought of that outputs something close to that is:

$ zq -version
Version: v1.7.0-48-g1874aeb4

$ zq -Z 'over this | ports:=collect(ports[0]) by ip' scan.json 
{
    ip: "192.168.5.53",
    ports: [
        {
            port: 515,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        },
        {
            port: 21,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        }
    ]
}
{
    ip: "192.168.5.49",
    ports: [
        {
            port: 49152,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        },
        {
            port: 3401,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        }
    ]
}

However, notice in this case that we've got named fields ip and ports whereas in the jq example the IP address string was used as an object key with the value being the array of port data. If a user wanted to get the exact same output jq was producing, the best approach the team came up with is:

$ zq -Z 'over this | ports:=collect(ports[0]) by ip | unflatten([{key:ip,value:ports}])' scan.json 
{
    "192.168.5.53": [
        {
            port: 515,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        },
        {
            port: 21,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        }
    ]
}
{
    "192.168.5.49": [
        {
            port: 49152,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        },
        {
            port: 3401,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        }
    ]
}

(There's actually a subtle difference between this output and what came from jq, but that's further explored in #4565.)

This idiom of collect() + unflatten() has come up a few times before (example: #4332) and I'm starting to see why: unflatten([{key:...,value:...}]) is the only way I can see to create new field/value pairs in records where the field name is based on a string that's been programmatically generated within the Zed pipeline. It therefore seems like we could benefit from some shorthand that would achieve the same. I'm at a loss for what precisely to propose, but I sense it would be something like the indexing syntax but usable on in "left hand" contexts. For instance, when hacking at this, things I tried in vain hoping they might work included:

$ zq -Z 'over this | ports:=collect(ports[0]) by ip | yield {under(ip):ports}' scan.json 
zq: error parsing Zed at column 58:
over this | ports:=collect(ports[0]) by ip | yield {under(ip):ports}
                                                     === ^ ===

$ zq -Z 'over this | ports:=collect(ports[0]) by ip | put this[ip]:=ports' scan.json 
illegal left-hand side of assignment

$ zq -Z 'over this | ports:=collect(ports[0]) by ip | rename this[ip]:=ports' scan.json 
'rename' requires explicit field references
@philrz
Copy link
Contributor Author

philrz commented May 1, 2023

@mccanne recently had the following reactions.

We could also allow Zed record literal expressions to take a computed value for the key so you could say {this["ip"]:...} and there would be an implied cast to string for the record key.

Speaking of which, it might also be nice to say ... by this["ip"]:=ip

@philrz
Copy link
Contributor Author

philrz commented Oct 26, 2023

Verified in Zed commit 92b0acd.

The syntax shown as the middle of the three queries I tried above that previously failed now works as expected, no longer returning the illegal left-hand side of assignment error. I had to tack on a | drop ip,ports to make it equivalent to the longer variant I used previously that used the idiom of collect() + unflatten().

$ zq -version
Version: v1.10.0-18-g92b0acdb

$ zq -Z 'over this | ports:=collect(ports[0]) by ip | put this[ip]:=ports | drop ip,ports' scan.json
{
    "192.168.5.53": [
        {
            port: 515,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        },
        {
            port: 21,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        }
    ]
}
{
    "192.168.5.49": [
        {
            port: 49152,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        },
        {
            port: 3401,
            proto: "tcp",
            status: "open",
            reason: "syn-ack",
            ttl: 64
        }
    ]
}

Thanks @mattnibs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants