rfc: keyword schema mapping v1 #11951

lukashino · 2024-10-13T19:35:39Z

Explained in the MD document that is part of the PR - switch to "rich diff" mode for conversion to a readable format.

jasonish · 2024-10-13T19:50:40Z

It does look like a schema can be extended with custom fields (https://json-schema.org/draft/2019-09/json-schema-core#rfc.section.6.5), my only comment here would be some sort of prefix to make clear they are suricata extensions to easily differentiate what parts are jsonschema and what are our custom extensions to it.

codecov · 2024-10-13T19:50:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.40%. Comparing base (d5dd549) to head (d9d03ff).
Report is 57 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11951      +/-   ##
==========================================
+ Coverage   82.70%   83.40%   +0.69%     
==========================================
  Files         912      910       -2     
  Lines      249102   257610    +8508     
==========================================
+ Hits       206018   214852    +8834     
+ Misses      43084    42758     -326

Flag	Coverage Δ
fuzzcorpus	`61.54% <ø> (+0.86%)`	⬆️
livemode	`19.38% <ø> (+0.66%)`	⬆️
pcap	`44.48% <ø> (+0.37%)`	⬆️
suricata-verify	`62.74% <ø> (+0.60%)`	⬆️
unittests	`59.37% <ø> (+0.38%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

inashivb

Wow. Thanks. This is very elaborate.
When I had heard the problem statement, I had misunderstood it to be a problem of a simple script that flattens eve.json and tries a regex match against the --list-keywords output and vice-versa and gives an output about possibly missing fields in both.

This would indeed be helpful in cases where we do not have an exact mapping.

Q: I think I do not understand the concept of an N:N mapping here. What does it mean?
I thought we would be looking for 1:1 mapping b/w schema:implemented rule keywords..

lukashino · 2024-10-14T13:42:23Z

N:N mapping means that we can have multiple keywords can be related to one Suricata output and multiple Suricata outputs can be related to multiple detection keywords. The best example is probably from the files where:

To eve.json output fields like:

nfs.filename
ftp_data.filename
files[].filename
fileinfo.filename

Following keywords are somehow related:

file.name (full file name - mycustomfilename.pdf)
filename (only the prefix - mycustomfilename)
fileext (only the suffix - .pdf)

1:1 mapping is not the goal here, as I think it can be more user-friendly to use the filename keyword to match on the actual filename.
On user friendliness (or rulewriter friendliness) example might better used http.uri and http.uri.raw example where both keywords relate to http.uri eve.json output field. But as http.uri is normalized, the rule writer doesn't need to care in what form the original URI was written, the person simply writes the content that should be searched for.

lukashino · 2024-10-14T13:56:46Z

Now we need to agree if this suggestion would be sufficient and then we can proceed with the tooling. (Added also a prefix to the keyword.)

"url": {
    "type": "string",
    "mapping-detect-keywords": [
        "http.uri.raw",
        "http.uri",
        "http.urilen"
    ]
},

victorjulien · 2024-10-15T11:38:25Z

rfc-schema-keyword.md

+  - fileext
+
+- file.name, filename, fileext can access eve.json fields:
+  - nfs.filename


I think these nfs and ftp_data fields are different from the fileinfo records

victorjulien · 2024-10-15T11:43:13Z

rfc-schema-keyword.md

+
+To make it managable in a text editor I thought of describing the relationship primarily in one direction e.g. what eve.json fields are described by what keywords. The other direction, what fields are affected by what keywords, can be obtained by inversing the data structure.
+
+The keyword and the eve.json fields can be in three states (somewhat similar to Git) - tracked, unassigned, ignored.


what could tracked look like? I assume we'd have a redmine ticket reference there?

Hmm by tracked I meant mapped (so as it should be).
Ignored could possibly reference/contain an explanation.
Unassigned would be the screaming one.

victorjulien · 2024-10-15T11:47:18Z

Would it make sense to have a mapping specify whether its an exact match, a fuzzy match, a subset or a superset?

Like http.request_line contains raw uri, method, etc. So http.request_line can be used to match on those fields, and could be in the "superset" group... It's almost like a hierarchy: raw content -> http.request frame -> http.request_line -> http.uri. Each of them gets more precise.

lukashino · 2024-10-15T14:03:47Z

indeed, I think it might be a good idea to have at least the possibility for that.
Maybe creating a hierarchy would be "too much". For instance, http.urilen would not directly fit in but is related to http.uri.raw. Same with http.uri and http.uri.raw, what is more precise?
Also, I am not convinced you could be as precise with raw content as you are with http-related keywords.
So instead of creating a hierarchy, we could just aim for related keywords.

The other question is, will it help us with anything?
The current task doesn't require that.

For http.uri eve.json field you could have:
keywords-exact-match:

http.uri
http.uri.raw
http.urilen

keywords-partly-match:

http.request_line
frame.http.request

I'll expand on this more. Thanks for bringing up request_line, missed that.

lukashino · 2024-10-21T08:42:22Z

Some extra notes for the next revision:

try to make a more hierarchy-like structure
maybe leave out urilen for now, try to propose a structure, and maybe then we will try to adhere to more consistent keyword matching
urilen can be replaced with bsize now for every buffer there - urilen could be made obsolete legacy keyword now

rfc: keyword schema mapping

ecaa8a1

lukashino requested a review from jasonish October 13, 2024 19:35

inashivb reviewed Oct 14, 2024

View reviewed changes

victorjulien reviewed Oct 15, 2024

View reviewed changes

update the data mapping proposal

d9d03ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: keyword schema mapping v1 #11951

rfc: keyword schema mapping v1 #11951

lukashino commented Oct 13, 2024

jasonish commented Oct 13, 2024 •

edited

Loading

codecov bot commented Oct 13, 2024 •

edited

Loading

inashivb left a comment

lukashino commented Oct 14, 2024

lukashino commented Oct 14, 2024

victorjulien Oct 15, 2024

victorjulien Oct 15, 2024

lukashino Oct 15, 2024

victorjulien commented Oct 15, 2024

lukashino commented Oct 15, 2024

lukashino commented Oct 21, 2024


		To make it managable in a text editor I thought of describing the relationship primarily in one direction e.g. what eve.json fields are described by what keywords. The other direction, what fields are affected by what keywords, can be obtained by inversing the data structure.

		The keyword and the eve.json fields can be in three states (somewhat similar to Git) - tracked, unassigned, ignored.

rfc: keyword schema mapping v1 #11951

Are you sure you want to change the base?

rfc: keyword schema mapping v1 #11951

Conversation

lukashino commented Oct 13, 2024

jasonish commented Oct 13, 2024 • edited Loading

codecov bot commented Oct 13, 2024 • edited Loading

Codecov Report

inashivb left a comment

Choose a reason for hiding this comment

lukashino commented Oct 14, 2024

lukashino commented Oct 14, 2024

victorjulien Oct 15, 2024

Choose a reason for hiding this comment

victorjulien Oct 15, 2024

Choose a reason for hiding this comment

lukashino Oct 15, 2024

Choose a reason for hiding this comment

victorjulien commented Oct 15, 2024

lukashino commented Oct 15, 2024

lukashino commented Oct 21, 2024

jasonish commented Oct 13, 2024 •

edited

Loading

codecov bot commented Oct 13, 2024 •

edited

Loading