-
Notifications
You must be signed in to change notification settings - Fork 974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: ecs-ification of BRO_ patterns #286
Conversation
This reverts commit c996f98.
patterns/ecs-v1/bro
Outdated
BRO_DNS %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{WORD:[network][transport]}\t(?:-|%{INT:[dns][id]:int})\t(?:-|%{BRO_DATA:[dns][question][name]})\t(?:-|%{INT:[zeek][dns][qclass]:int})\t(?:-|%{BRO_DATA:[zeek][dns][qclass_name]})\t(?:-|%{INT:[zeek][dns][qtype]:int})\t(?:-|%{BRO_DATA:[dns][question][type]})\t(?:-|%{INT:[zeek][dns][rcode]:int})\t(?:-|%{BRO_DATA:[dns][response_code]})\t(?:-|%{BRO_BOOL:[zeek][dns][AA]})\t(?:-|%{BRO_BOOL:[zeek][dns][TC]})\t(?:-|%{BRO_BOOL:[zeek][dns][RD]})\t(?:-|%{BRO_BOOL:[zeek][dns][RA]})\t(?:-|%{NONNEGINT:[zeek][dns][Z]:int})\t(?:-|%{BRO_DATA:[zeek][dns][answers]})\t(?:-|%{DATA:[zeek][dns][TTLs]})\t(?:-|%{BRO_BOOL:[zeek][dns][rejected]}) | ||
|
||
# dns.log - the 'new' format (added *zeek.dns.rtt*) | ||
ZEEK_DNS %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{WORD:[network][transport]}\t(?:-|%{INT:[dns][id]:int})\t(?:-|%{NUMBER:[zeek][dns][rtt]:float})\t(?:-|%{BRO_DATA:[dns][question][name]})\t(?:-|%{INT:[zeek][dns][qclass]:int})\t(?:-|%{BRO_DATA:[zeek][dns][qclass_name]})\t(?:-|%{INT:[zeek][dns][qtype]:int})\t(?:-|%{BRO_DATA:[dns][question][type]})\t(?:-|%{INT:[zeek][dns][rcode]:int})\t(?:-|%{BRO_DATA:[dns][response_code]})\t%{BRO_BOOL:[zeek][dns][AA]}\t%{BRO_BOOL:[zeek][dns][TC]}\t%{BRO_BOOL:[zeek][dns][RD]}\t%{BRO_BOOL:[zeek][dns][RA]}\t%{NONNEGINT:[zeek][dns][Z]:int}\t(?:-|%{BRO_DATA:[zeek][dns][answers]})\t(?:-|%{DATA:[zeek][dns][TTLs]})\t(?:-|%{BRO_BOOL:[zeek][dns][rejected]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dns.id
field is being type-casted %{INT:[dns][id]:int}
(for Beats compatibility).
ECS prescribes the type for dns.id
as keyword, thus it should stay a string, any preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the field is type keyword
in the index template mapping, I don't believe it being cast to an integer will be a problem. It'll still be indexed as a keyword
.
patterns/ecs-v1/bro
Outdated
BRO_DNS %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{WORD:[network][transport]}\t(?:-|%{INT:[dns][id]:int})\t(?:-|%{BRO_DATA:[dns][question][name]})\t(?:-|%{INT:[zeek][dns][qclass]:int})\t(?:-|%{BRO_DATA:[zeek][dns][qclass_name]})\t(?:-|%{INT:[zeek][dns][qtype]:int})\t(?:-|%{BRO_DATA:[dns][question][type]})\t(?:-|%{INT:[zeek][dns][rcode]:int})\t(?:-|%{BRO_DATA:[dns][response_code]})\t(?:-|%{BRO_BOOL:[zeek][dns][AA]})\t(?:-|%{BRO_BOOL:[zeek][dns][TC]})\t(?:-|%{BRO_BOOL:[zeek][dns][RD]})\t(?:-|%{BRO_BOOL:[zeek][dns][RA]})\t(?:-|%{NONNEGINT:[zeek][dns][Z]:int})\t(?:-|%{BRO_DATA:[zeek][dns][answers]})\t(?:-|%{DATA:[zeek][dns][TTLs]})\t(?:-|%{BRO_BOOL:[zeek][dns][rejected]}) | ||
|
||
# dns.log - the 'new' format (added *zeek.dns.rtt*) | ||
ZEEK_DNS %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{WORD:[network][transport]}\t(?:-|%{INT:[dns][id]:int})\t(?:-|%{NUMBER:[zeek][dns][rtt]:float})\t(?:-|%{BRO_DATA:[dns][question][name]})\t(?:-|%{INT:[zeek][dns][qclass]:int})\t(?:-|%{BRO_DATA:[zeek][dns][qclass_name]})\t(?:-|%{INT:[zeek][dns][qtype]:int})\t(?:-|%{BRO_DATA:[dns][question][type]})\t(?:-|%{INT:[zeek][dns][rcode]:int})\t(?:-|%{BRO_DATA:[dns][response_code]})\t%{BRO_BOOL:[zeek][dns][AA]}\t%{BRO_BOOL:[zeek][dns][TC]}\t%{BRO_BOOL:[zeek][dns][RD]}\t%{BRO_BOOL:[zeek][dns][RA]}\t%{NONNEGINT:[zeek][dns][Z]:int}\t(?:-|%{BRO_DATA:[zeek][dns][answers]})\t(?:-|%{DATA:[zeek][dns][TTLs]})\t(?:-|%{BRO_BOOL:[zeek][dns][rejected]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're keeping source
/destination
naming convention even for DNS logs (as opposed to client
/server
),
e.g. source.ip
+ source.port
(except for the ZEEK_FILES
pattern where we match the tx_host
- host that transferred the file - as server.ip
)
patterns/ecs-v1/bro
Outdated
BRO_FILES %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][files][fuid]}\t(?:-|%{IP:[zeek][files][tx_host]})\t(?:-|%{IP:[zeek][files][rx_host]})\t(?:-|%{BRO_DATA:[zeek][files][session_ids]})\t(?:-|%{BRO_DATA:[zeek][files][source]})\t(?:-|%{INT:[zeek][files][depth]:int})\t(?:-|%{BRO_DATA:[zeek][files][analyzers]})\t(?:-|%{BRO_DATA:[file][mime_type]})\t(?:-|%{BRO_DATA:[file][name]})\t(?:-|%{NUMBER:[zeek][files][duration]:float})\t(?:-|%{BRO_DATA:[zeek][files][local_orig]})\t(?:-|%{BRO_BOOL:[zeek][files][is_orig]})\t(?:-|%{INT:[zeek][files][seen_bytes]:int})\t(?:-|%{INT:[file][size]:int})\t(?:-|%{INT:[zeek][files][missing_bytes]:int})\t(?:-|%{INT:[zeek][files][overflow_bytes]:int})\t(?:-|%{BRO_BOOL:[zeek][files][timedout]})\t(?:-|%{BRO_DATA:[zeek][files][parent_fuid]})\t(?:-|%{BRO_DATA:[file][hash][md5]})\t(?:-|%{BRO_DATA:[file][hash][sha1]})\t(?:-|%{BRO_DATA:[file][hash][sha256]})\t(?:-|%{BRO_DATA:[zeek][files][extracted]}) | ||
|
||
# files.log - new format (2 new fields added at the end) | ||
ZEEK_FILES_TX_HOSTS (?:-|%{IP:[server][ip]})|(?<[zeek][files][tx_hosts]>%{IP:[server][ip]}(?:[\s,]%{IP})+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zeek.files.tx_host
field -> server.ip
(Beats copies the field while also keeping zeek.files.tx_hosts
)
patterns/ecs-v1/bro
Outdated
|
||
# files.log - new format (2 new fields added at the end) | ||
ZEEK_FILES_TX_HOSTS (?:-|%{IP:[server][ip]})|(?<[zeek][files][tx_hosts]>%{IP:[server][ip]}(?:[\s,]%{IP})+) | ||
ZEEK_FILES_RX_HOSTS (?:-|%{IP:[client][ip]})|(?<[zeek][files][rx_hosts]>%{IP:[client][ip]}(?:[\s,]%{IP})+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're handling cases when multiple IPs are present: the first IP in the set ends up as client.ip
,
while the whole field is also matched as rx_hosts
see the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and when a single IP is present, we intentionally only capture [client][ip]
(not [zeek][files][rx_hosts]
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct - that was the intention here: to not have "duplication" for the common case.
we always want [client][ip]
filled and if there are multiple IP fields than also capture them into the _hosts field.
logs having multiple IPs seems to be very rare - almost not worth the hustle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noted two minor things you may want to address in comments below. Other than these, this looks good to me 👍
patterns/ecs-v1/bro
Outdated
# conn.log | ||
BRO_CONN %{NUMBER:ts}\t%{NOTSPACE:uid}\t%{IP:orig_h}\t%{INT:orig_p}\t%{IP:resp_h}\t%{INT:resp_p}\t%{WORD:proto}\t%{GREEDYDATA:service}\t%{NUMBER:duration}\t%{NUMBER:orig_bytes}\t%{NUMBER:resp_bytes}\t%{GREEDYDATA:conn_state}\t%{GREEDYDATA:local_orig}\t%{GREEDYDATA:missed_bytes}\t%{GREEDYDATA:history}\t%{GREEDYDATA:orig_pkts}\t%{GREEDYDATA:orig_ip_bytes}\t%{GREEDYDATA:resp_pkts}\t%{GREEDYDATA:resp_ip_bytes}\t%{GREEDYDATA:tunnel_parents} | ||
# http.log - the 'new' format has *version* and *origin* fields added and *filename* replaced with *orig_filenames* + *resp_filenames* | ||
ZEEK_HTTP %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{INT:[zeek][http][trans_depth]:int}\t(?:-|%{WORD:[http][request][method]})\t(?:-|%{BRO_DATA:[url][domain]})\t(?:-|%{BRO_DATA:[url][original]})\t(?:-|%{BRO_DATA:[http][request][referrer]})\t(?:-|%{NUMBER:[http][version]})\t(?:-|%{BRO_DATA:[user_agent][original]})\t(?:-|%{BRO_DATA:[zeek][http][origin]})\t(?:-|%{NUMBER:[http][request][body][bytes]:int})\t(?:-|%{NUMBER:[http][response][body][bytes]:int})\t(?:-|%{POSINT:[http][response][status_code]:int})\t(?:-|%{DATA:[zeek][http][status_msg]})\t(?:-|%{POSINT:[zeek][http][info_code]:int})\t(?:-|%{DATA:[zeek][http][info_msg]})\t(?:\(empty\)|%{BRO_DATA:[zeek][http][tags]})\t(?:-|%{BRO_DATA:[url][username]})\t(?:-|%{BRO_DATA:[url][password]})\t(?:-|%{BRO_DATA:[zeek][http][proxied]})\t(?:-|%{BRO_DATA:[zeek][http][orig_fuids]})\t(?:-|%{BRO_DATA:[zeek][http][orig_filenames]})\t(?:-|%{BRO_DATA:[zeek][http][orig_mime_types]})\t(?:-|%{BRO_DATA:[zeek][http][resp_fuids]})\t(?:-|%{BRO_DATA:[zeek][http][resp_filenames]})\t(?:-|%{BRO_DATA:[zeek][http][resp_mime_types]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see mime_types here as well 👀
ZEEK_DNS %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{WORD:[network][transport]}\t(?:-|%{INT:[dns][id]:int})\t(?:-|%{NUMBER:[zeek][dns][rtt]:float})\t(?:-|%{BRO_DATA:[dns][question][name]})\t(?:-|%{INT:[zeek][dns][qclass]:int})\t(?:-|%{BRO_DATA:[zeek][dns][qclass_name]})\t(?:-|%{INT:[zeek][dns][qtype]:int})\t(?:-|%{BRO_DATA:[dns][question][type]})\t(?:-|%{INT:[zeek][dns][rcode]:int})\t(?:-|%{BRO_DATA:[dns][response_code]})\t%{BRO_BOOL:[zeek][dns][AA]}\t%{BRO_BOOL:[zeek][dns][TC]}\t%{BRO_BOOL:[zeek][dns][RD]}\t%{BRO_BOOL:[zeek][dns][RA]}\t%{NONNEGINT:[zeek][dns][Z]:int}\t(?:-|%{BRO_DATA:[zeek][dns][answers]})\t(?:-|%{DATA:[zeek][dns][TTLs]})\t(?:-|%{BRO_BOOL:[zeek][dns][rejected]}) | ||
|
||
# conn.log - old bro, also supports 'newer' format (optional *zeek.connection.local_resp* flag) compared to non-ecs mode | ||
BRO_CONN %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{WORD:[network][transport]}\t(?:-|%{BRO_DATA:[network][protocol]})\t(?:-|%{NUMBER:[zeek][connection][duration]:float})\t(?:-|%{INT:[zeek][connection][orig_bytes]:int})\t(?:-|%{INT:[zeek][connection][resp_bytes]:int})\t(?:-|%{BRO_DATA:[zeek][connection][state]})\t(?:-|%{BRO_BOOL:[zeek][connection][local_orig]})\t(?:(?:-|%{BRO_BOOL:[zeek][connection][local_resp]})\t)?(?:-|%{INT:[zeek][connection][missed_bytes]:int})\t(?:-|%{BRO_DATA:[zeek][connection][history]})\t(?:-|%{INT:[source][packets]:int})\t(?:-|%{INT:[source][bytes]:int})\t(?:-|%{INT:[destination][packets]:int})\t(?:-|%{INT:[destination][bytes]:int})\t(?:\(empty\)|%{BRO_DATA:[zeek][connection][tunnel_parents]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think [zeek][connection][orig_bytes]
and [zeek][connection][resp_bytes]
could be mapped to source.bytes
and destination.bytes
here.
Actually I see the ECS fields used later in the same pattern. Were the fields in the beginning overlooked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't overlooked but its definitely a bit confusing, we follow here what Beats is doing - taking the later fields (orig_ip_bytes
and resp_ip_bytes
) and using those as source/destination.bytes
.
According to the Zeek docs the IP bytes are from the IP packet. While orig_bytes
and resp_bytes
are taken from the TCP sequence no. Beats drops the orig_bytes
, resp_bytes
fields, for LS we capture them (as [zeek][connection][orig_bytes]
and [zeek][connection][resp_bytes]
) for the user to decide what to do with them.
expect(grok).to include("server"=>{"ip" => '10.1.1.2'}) # tx_host | ||
expect(grok).to include("client"=>{"ip" => '192.168.1.105'}) # rx_host | ||
expect(grok).to include("zeek"=>{"files" => hash_including("tx_hosts" => '10.1.1.2 10.1.1.1')}) | ||
expect(grok).to include("zeek"=>{"files" => hash_including("rx_hosts" => '192.168.1.105,192.168.1.10,192.168.1.11')}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love that it keeps the full list in the custom field 👍
... instead of the rx/tx_host custom fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I am impressed with the thought that went into mapping this all. I've left a couple questions, but overall this looks fantastic 🎉
patterns/ecs-v1/bro
Outdated
|
||
# files.log - new format (2 new fields added at the end) | ||
ZEEK_FILES_TX_HOSTS (?:-|%{IP:[server][ip]})|(?<[zeek][files][tx_hosts]>%{IP:[server][ip]}(?:[\s,]%{IP})+) | ||
ZEEK_FILES_RX_HOSTS (?:-|%{IP:[client][ip]})|(?<[zeek][files][rx_hosts]>%{IP:[client][ip]}(?:[\s,]%{IP})+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and when a single IP is present, we intentionally only capture [client][ip]
(not [zeek][files][rx_hosts]
)?
patterns/ecs-v1/bro
Outdated
# conn.log | ||
BRO_CONN %{NUMBER:ts}\t%{NOTSPACE:uid}\t%{IP:orig_h}\t%{INT:orig_p}\t%{IP:resp_h}\t%{INT:resp_p}\t%{WORD:proto}\t%{GREEDYDATA:service}\t%{NUMBER:duration}\t%{NUMBER:orig_bytes}\t%{NUMBER:resp_bytes}\t%{GREEDYDATA:conn_state}\t%{GREEDYDATA:local_orig}\t%{GREEDYDATA:missed_bytes}\t%{GREEDYDATA:history}\t%{GREEDYDATA:orig_pkts}\t%{GREEDYDATA:orig_ip_bytes}\t%{GREEDYDATA:resp_pkts}\t%{GREEDYDATA:resp_ip_bytes}\t%{GREEDYDATA:tunnel_parents} | ||
# http.log - the 'new' format has *version* and *origin* fields added and *filename* replaced with *orig_filenames* + *resp_filenames* | ||
ZEEK_HTTP %{NUMBER:timestamp}\t%{NOTSPACE:[zeek][session_id]}\t%{IP:[source][ip]}\t%{INT:[source][port]:int}\t%{IP:[destination][ip]}\t%{INT:[destination][port]:int}\t%{INT:[zeek][http][trans_depth]:int}\t(?:-|%{WORD:[http][request][method]})\t(?:-|%{BRO_DATA:[url][domain]})\t(?:-|%{BRO_DATA:[url][original]})\t(?:-|%{BRO_DATA:[http][request][referrer]})\t(?:-|%{NUMBER:[http][version]})\t(?:-|%{BRO_DATA:[user_agent][original]})\t(?:-|%{BRO_DATA:[zeek][http][origin]})\t(?:-|%{NUMBER:[http][request][body][bytes]:int})\t(?:-|%{NUMBER:[http][response][body][bytes]:int})\t(?:-|%{POSINT:[http][response][status_code]:int})\t(?:-|%{DATA:[zeek][http][status_msg]})\t(?:-|%{POSINT:[zeek][http][info_code]:int})\t(?:-|%{DATA:[zeek][http][info_msg]})\t(?:\(empty\)|%{BRO_DATA:[zeek][http][tags]})\t(?:-|%{BRO_DATA:[url][username]})\t(?:-|%{BRO_DATA:[url][password]})\t(?:-|%{BRO_DATA:[zeek][http][proxied]})\t(?:-|%{BRO_DATA:[zeek][http][orig_fuids]})\t(?:-|%{BRO_DATA:[zeek][http][orig_filenames]})\t(?:-|%{BRO_DATA:[http][request][mime_type]})\t(?:-|%{BRO_DATA:[zeek][http][resp_fuids]})\t(?:-|%{BRO_DATA:[zeek][http][resp_filenames]})\t(?:-|%{BRO_DATA:[http][response][mime_type]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there value in referencing the BRO_BOOL
and BRO_DATA
types from within the ZEEK_*
patterns, or should we define them separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - that's actually a good catch.
makes little sense, we should consider spliting this out into separate files - having ecs-v1/bro as well as ecs-v1/zeek and introduce ZEEK_BOOL
/ZEEK_DATA
helper patterns for the later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern definitions are now split-ed into standalone patterns/ecs-v1/bro and patterns/ecs-v1/zeek file-sets
For context somewhere ~3 years ago Bro was rebranded as Zeek.
LS' bro patterns were matching an old version of logs, some patterns got updated along the way several times.
There's an attempt to handle parsing of old Bro logs as well as 'new' Zeek format:
Each pattern has an example spec of the old matching and the new one, linking those here (instead of pasting sample jsons as they would get a bit long with 4 patterns) :
BRO_HTTP
http.log - a bit "verbose" as there was an existing specsBRO_DNS
dns.logBRO_CONN
conn.logBRO_FILES
files.logNOTE: Beats supports the full Zeek pattern set (we only support 4 here) since 7.6.0, we strive here for maximum compatibility with field naming in Beats.
TODO: consider moving ZEEK_ patterns into a separate file, for specs it makes sense to have them in one place.