Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert queue optimizations - v4 #7314

Closed
wants to merge 12 commits into from

Conversation

jufajardini
Copy link
Contributor

Previous PR: #7297

Link to redmine ticket: https://redmine.openinfosecfoundation.org/issues/4943
https://redmine.openinfosecfoundation.org/issues/5179
https://redmine.openinfosecfoundation.org/issues/4207

Describe changes:

  • extracted WARN_ONCE macro to util-debug
  • move AlertQueue functions to detect-engine-alert
  • incorporate feedback from previous PR
  • (tried to) handle AlertQueueExpand failure more gracefully (this includes trying again with a smaller size, warning only once per thread AND incrementing discarded alerts stats, if it really fails)
  • pass on tx_id field value if frame has it

Left this as a draft because I'm not sure the solutions here are good AND because the handle alert queue expand failure commit should be squashed, if good enough

suricata-verify-pr: 808
OISF/suricata-verify#808

The maximum of possible alerts triggered by a unique packet was
hardcoded to 15. With usage of 'noalert' rules, that limit could be
reached somewhat easily. Make that configurable via suricata.yaml.

Conf Bug#4941

Task OISF#4207
Initial work to bring part of the alert queue processing to
DetectEngineThreadCtx.

Task OISF#4943
Do all alert queue processing before actually appending
the PacketAlerts to the Packet's alert queue.

Task OISF#4943
Since we now only copy the PacketAlerts to the Packet's queue after
processing them, we no longer do packet alert appending from
detect-engine-alert, nor do we remove PacketAlerts from the queue (if
they're discarded by overflow or thresholding, they're not copied to the
final alert queue).

Task OISF#4943
Add a counter to our stats log with the total of alerts that have been
discarded due to packet alert queue overflow.

Task OISF#5179
Since more than one file needed a similar function, make it available
from util/debug instead of having it as a local macro.

Task OISF#4943
In the unlikely case of an alert queue expansion reallocation failure,
we want to make sure that we don't just ignore it. But we also don't
want to break things or flood Suri with warnings.

Task OISF#4943
@codecov
Copy link

codecov bot commented Apr 26, 2022

Codecov Report

Merging #7314 (704e85c) into master (ddf9c9d) will decrease coverage by 0.03%.
The diff coverage is 85.57%.

@@            Coverage Diff             @@
##           master    #7314      +/-   ##
==========================================
- Coverage   75.82%   75.78%   -0.04%     
==========================================
  Files         656      656              
  Lines      190051   190099      +48     
==========================================
- Hits       144102   144069      -33     
- Misses      45949    46030      +81     
Flag Coverage Δ
fuzzcorpus 60.30% <73.26%> (-0.12%) ⬇️
suricata-verify 51.56% <85.26%> (+0.02%) ⬆️
unittests 61.01% <65.95%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@jufajardini jufajardini marked this pull request as ready for review April 26, 2022 21:35
@jufajardini jufajardini requested a review from a team as a code owner April 26, 2022 21:35
@jufajardini jufajardini marked this pull request as draft April 26, 2022 21:36
@suricata-qa
Copy link

ERROR:

ERROR: QA failed on tlpw1_files_sha256.

Pipeline 7189

uint16_t new_cap = det_ctx->alert_queue_capacity * 2;
void *tmp_queue = SCRealloc(det_ctx->alert_queue, (size_t)(sizeof(PacketAlert) * new_cap));
if (unlikely(tmp_queue == NULL)) {
/* if first realloc fail, try to grow just 50% */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit too much. Lets just handle the initial failure and not try to be smart, which ultimately leads to more complexity and more untested error code.

}
det_ctx->alert_queue = tmp_queue;
det_ctx->alert_queue_capacity = new_cap;
SCLogNotice("Alert queue size doubled: %u elements, bytes: %" PRIuMAX "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCLogDebug

@@ -1180,6 +1180,10 @@ legacy:
# - reject
# - alert

# Define maximum number of possible alerts that can be triggered for the same
# packet. Default is 15
packet-alert-max: 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can leave commented out

@@ -34,6 +34,13 @@ static Signature g_tag_signature;
/** tag packet alert structure for tag alerts */
static PacketAlert g_tag_pa;

/* For WARN_ONCE, a record of warnings that have already been
* issued. */
static thread_local bool once_err[SC_ERR_MAX];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlucovsky any thoughts on how this works wrt multiple static declarations with the same name in different modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the name, as I wasn't sure if that could cause issues or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlucovsky any thoughts on how this works wrt multiple static declarations with the same name in different modules?

Because it's static, the scope should be local and treated like any other static with the same name.

A quick experiment confirms that the scope is restricted to the current source module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this add a per-thread array of size SC_ERR_MAX to detect-engine-alert.c but it's only use with a single error code SC_ERR_MEM_ALLOC. If memory is sufficiently low that a small alloc will fail we'll get a flurry of error messages (one from each thread) and then (it's likely) Suricata will fail elsewhere.

Maybe the realloc logic isn't needed or a FatalError should occur or perhaps, now that errors are handled, simply return the old capacity on realloc failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your analysis, I would prefer then simply return the old capacity.

det_ctx->alert_queue_size = 0;
det_ctx->alert_queue = SCCalloc(packet_alert_max, sizeof(PacketAlert));
if (det_ctx->alert_queue == NULL) {
FatalError(SC_ERR_MEM_ALLOC, "failed to allocate %" PRIu64 " bytes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding mention of the alert queue in the failure message.

{
const PacketAlert *pa0 = a;
const PacketAlert *pa1 = b;
if (pa1->num == pa0->num)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related -- can you update the comment in decode.h for the struct PacketAlert to reflect what's actually sorted? There, the comments state that only num an action are used for sorting. I think tx_id needs a similar comment.

/* we will not copy this to the AlertQueue */
p->alerts.suppressed++;
} else if (p->alerts.cnt < packet_alert_max) {
memcpy(&p->alerts.alerts[p->alerts.cnt], &det_ctx->alert_queue[i], sizeof(PacketAlert));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest simple structure copy instea

          p->alerts.alerts[p->alerts.cnt] = det_ctx->alert_queue[i];

@@ -34,6 +34,13 @@ static Signature g_tag_signature;
/** tag packet alert structure for tag alerts */
static PacketAlert g_tag_pa;

/* For WARN_ONCE, a record of warnings that have already been
* issued. */
static thread_local bool once_err[SC_ERR_MAX];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this add a per-thread array of size SC_ERR_MAX to detect-engine-alert.c but it's only use with a single error code SC_ERR_MEM_ALLOC. If memory is sufficiently low that a small alloc will fail we'll get a flurry of error messages (one from each thread) and then (it's likely) Suricata will fail elsewhere.

Maybe the realloc logic isn't needed or a FatalError should occur or perhaps, now that errors are handled, simply return the old capacity on realloc failures.

@jufajardini
Copy link
Contributor Author

Replaced by: #7326

@jufajardini jufajardini deleted the alert-queue-det-ctx/v4 branch May 2, 2022 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants