-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: keyword schema mapping v1 #11951
base: master
Are you sure you want to change the base?
Conversation
It does look like a schema can be extended with custom fields (https://json-schema.org/draft/2019-09/json-schema-core#rfc.section.6.5), my only comment here would be some sort of prefix to make clear they are suricata extensions to easily differentiate what parts are jsonschema and what are our custom extensions to it. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #11951 +/- ##
==========================================
+ Coverage 82.70% 83.40% +0.69%
==========================================
Files 912 910 -2
Lines 249102 257610 +8508
==========================================
+ Hits 206018 214852 +8834
+ Misses 43084 42758 -326
Flags with carried forward coverage won't be shown. Click here to find out more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. Thanks. This is very elaborate.
When I had heard the problem statement, I had misunderstood it to be a problem of a simple script that flattens eve.json
and tries a regex match against the --list-keywords
output and vice-versa and gives an output about possibly missing fields in both.
This would indeed be helpful in cases where we do not have an exact mapping.
Q: I think I do not understand the concept of an N:N mapping here. What does it mean?
I thought we would be looking for 1:1 mapping b/w schema:implemented rule keywords..
N:N mapping means that we can have multiple keywords can be related to one Suricata output and multiple Suricata outputs can be related to multiple detection keywords. The best example is probably from the files where: To eve.json output fields like:
Following keywords are somehow related:
1:1 mapping is not the goal here, as I think it can be more user-friendly to use the |
Now we need to agree if this suggestion would be sufficient and then we can proceed with the tooling. (Added also a prefix to the keyword.) "url": {
"type": "string",
"mapping-detect-keywords": [
"http.uri.raw",
"http.uri",
"http.urilen"
]
}, |
- fileext | ||
|
||
- file.name, filename, fileext can access eve.json fields: | ||
- nfs.filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these nfs and ftp_data fields are different from the fileinfo records
|
||
To make it managable in a text editor I thought of describing the relationship primarily in one direction e.g. what eve.json fields are described by what keywords. The other direction, what fields are affected by what keywords, can be obtained by inversing the data structure. | ||
|
||
The keyword and the eve.json fields can be in three states (somewhat similar to Git) - tracked, unassigned, ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what could tracked look like? I assume we'd have a redmine ticket reference there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm by tracked I meant mapped (so as it should be).
Ignored could possibly reference/contain an explanation.
Unassigned would be the screaming one.
Would it make sense to have a mapping specify whether its an exact match, a fuzzy match, a subset or a superset? Like http.request_line contains raw uri, method, etc. So http.request_line can be used to match on those fields, and could be in the "superset" group... It's almost like a hierarchy: raw content -> http.request frame -> http.request_line -> http.uri. Each of them gets more precise. |
indeed, I think it might be a good idea to have at least the possibility for that. The other question is, will it help us with anything? For http.uri eve.json field you could have:
keywords-partly-match:
I'll expand on this more. Thanks for bringing up request_line, missed that. |
Some extra notes for the next revision:
|
Explained in the MD document that is part of the PR - switch to "rich diff" mode for conversion to a readable format.