-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alert queue optimizations - v4 #7314
Conversation
The maximum of possible alerts triggered by a unique packet was hardcoded to 15. With usage of 'noalert' rules, that limit could be reached somewhat easily. Make that configurable via suricata.yaml. Conf Bug#4941 Task OISF#4207
Initial work to bring part of the alert queue processing to DetectEngineThreadCtx. Task OISF#4943
Do all alert queue processing before actually appending the PacketAlerts to the Packet's alert queue. Task OISF#4943
Since we now only copy the PacketAlerts to the Packet's queue after processing them, we no longer do packet alert appending from detect-engine-alert, nor do we remove PacketAlerts from the queue (if they're discarded by overflow or thresholding, they're not copied to the final alert queue). Task OISF#4943
Add a counter to our stats log with the total of alerts that have been discarded due to packet alert queue overflow. Task OISF#5179
Since more than one file needed a similar function, make it available from util/debug instead of having it as a local macro. Task OISF#4943
In the unlikely case of an alert queue expansion reallocation failure, we want to make sure that we don't just ignore it. But we also don't want to break things or flood Suri with warnings. Task OISF#4943
Codecov Report
@@ Coverage Diff @@
## master #7314 +/- ##
==========================================
- Coverage 75.82% 75.78% -0.04%
==========================================
Files 656 656
Lines 190051 190099 +48
==========================================
- Hits 144102 144069 -33
- Misses 45949 46030 +81
Flags with carried forward coverage won't be shown. Click here to find out more. |
ERROR: ERROR: QA failed on tlpw1_files_sha256. Pipeline 7189 |
uint16_t new_cap = det_ctx->alert_queue_capacity * 2; | ||
void *tmp_queue = SCRealloc(det_ctx->alert_queue, (size_t)(sizeof(PacketAlert) * new_cap)); | ||
if (unlikely(tmp_queue == NULL)) { | ||
/* if first realloc fail, try to grow just 50% */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit too much. Lets just handle the initial failure and not try to be smart, which ultimately leads to more complexity and more untested error code.
} | ||
det_ctx->alert_queue = tmp_queue; | ||
det_ctx->alert_queue_capacity = new_cap; | ||
SCLogNotice("Alert queue size doubled: %u elements, bytes: %" PRIuMAX "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCLogDebug
@@ -1180,6 +1180,10 @@ legacy: | |||
# - reject | |||
# - alert | |||
|
|||
# Define maximum number of possible alerts that can be triggered for the same | |||
# packet. Default is 15 | |||
packet-alert-max: 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can leave commented out
@@ -34,6 +34,13 @@ static Signature g_tag_signature; | |||
/** tag packet alert structure for tag alerts */ | |||
static PacketAlert g_tag_pa; | |||
|
|||
/* For WARN_ONCE, a record of warnings that have already been | |||
* issued. */ | |||
static thread_local bool once_err[SC_ERR_MAX]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlucovsky any thoughts on how this works wrt multiple static declarations with the same name in different modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name, as I wasn't sure if that could cause issues or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlucovsky any thoughts on how this works wrt multiple static declarations with the same name in different modules?
Because it's static
, the scope should be local and treated like any other static
with the same name.
A quick experiment confirms that the scope is restricted to the current source module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this add a per-thread array of size SC_ERR_MAX
to detect-engine-alert.c
but it's only use with a single error code SC_ERR_MEM_ALLOC
. If memory is sufficiently low that a small alloc will fail we'll get a flurry of error messages (one from each thread) and then (it's likely) Suricata will fail elsewhere.
Maybe the realloc logic isn't needed or a FatalError
should occur or perhaps, now that errors are handled, simply return the old capacity on realloc failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From your analysis, I would prefer then simply return the old capacity.
det_ctx->alert_queue_size = 0; | ||
det_ctx->alert_queue = SCCalloc(packet_alert_max, sizeof(PacketAlert)); | ||
if (det_ctx->alert_queue == NULL) { | ||
FatalError(SC_ERR_MEM_ALLOC, "failed to allocate %" PRIu64 " bytes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding mention of the alert queue in the failure message.
{ | ||
const PacketAlert *pa0 = a; | ||
const PacketAlert *pa1 = b; | ||
if (pa1->num == pa0->num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related -- can you update the comment in decode.h
for the struct PacketAlert
to reflect what's actually sorted? There, the comments state that only num
an action
are used for sorting. I think tx_id
needs a similar comment.
/* we will not copy this to the AlertQueue */ | ||
p->alerts.suppressed++; | ||
} else if (p->alerts.cnt < packet_alert_max) { | ||
memcpy(&p->alerts.alerts[p->alerts.cnt], &det_ctx->alert_queue[i], sizeof(PacketAlert)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest simple structure copy instea
p->alerts.alerts[p->alerts.cnt] = det_ctx->alert_queue[i];
@@ -34,6 +34,13 @@ static Signature g_tag_signature; | |||
/** tag packet alert structure for tag alerts */ | |||
static PacketAlert g_tag_pa; | |||
|
|||
/* For WARN_ONCE, a record of warnings that have already been | |||
* issued. */ | |||
static thread_local bool once_err[SC_ERR_MAX]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this add a per-thread array of size SC_ERR_MAX
to detect-engine-alert.c
but it's only use with a single error code SC_ERR_MEM_ALLOC
. If memory is sufficiently low that a small alloc will fail we'll get a flurry of error messages (one from each thread) and then (it's likely) Suricata will fail elsewhere.
Maybe the realloc logic isn't needed or a FatalError
should occur or perhaps, now that errors are handled, simply return the old capacity on realloc failures.
Replaced by: #7326 |
Previous PR: #7297
Link to redmine ticket: https://redmine.openinfosecfoundation.org/issues/4943
https://redmine.openinfosecfoundation.org/issues/5179
https://redmine.openinfosecfoundation.org/issues/4207
Describe changes:
WARN_ONCE
macro toutil-debug
AlertQueue
functions todetect-engine-alert
AlertQueueExpand
failure more gracefully (this includes trying again with a smaller size, warning only once per thread AND incrementing discarded alerts stats, if it really fails)Left this as a draft because I'm not sure the solutions here are good AND because the
handle alert queue expand failure
commit should be squashed, if good enoughsuricata-verify-pr: 808
OISF/suricata-verify#808