Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next/508/20240628/v1 #11394

Merged
merged 30 commits into from
Jul 2, 2024
Merged

Conversation

victorjulien
Copy link
Member

Add support for 'by_flow' track option. This allows using the various
threshold options in the context of a single flow.

Example:

    alert tcp ... stream-event:pkt_broken_ack; \
        threshold:type limit, track by_flow, count 1, seconds 3600;

The example would limit the number of alerts to once per hour for
packets triggering the 'pkt_broken_ack' stream event.

Implemented as a special "flowvar" holding the threshold entries. This
means no synchronization is required, making this a cheaper option
compared to the other trackers.

Ticket: OISF#6822.
Allow rate_filter and thresholds from the global config to specify
tracking "by_flow".
Traffic variables (flowvars, flowbits, xbits, etc) use a smaller int for
their type than detection types. As a workaround make sure the values fit
in a uint8_t.
Limits propegation checked for DETECT_DEPTH as a content flag,
which appears to have worked by chance. After reshuffling the
keyword id's it no longer worked. This patch uses the proper
flag DETECT_CONTENT_DEPTH.
Thresholding often has 2 stages:

1. recording matches
2. appling an action, like suppress

E.g. with something like:
threshold:type limit, count 10, seconds 3600, track by_src;
the recording state is about counting 10 first hits for an IP,
then followed by the "suppress" state that might last an hour.

By_src/by_dst are expensive, as they do a host table lookup and lock
the host. If many threads require this access, lock contention becomes
a serious problem.

This patch adds a thread local cache to avoid the synchronization
overhead. When the threshold for a host enters the "apply" stage,
a thread local hash entry is added. This entry knows the expiry
time and the action to apply. This way the action can be applied
w/o the synchronization overhead.

A rbtree is used to handle expiration.

Implemented for IPv4.
Packet pointer is not used during allocation.
Add a callback and helper function to handle data expiration.

Update datasets to explicitly not use expiration.
Instead of a Host and IPPair table thresholding layer, use a dedicated
THash to store both. This allows hashing on host+sid+tracker or
ippair+sid+tracker, to create more unique hash keys.

This allows for fewer hash collisions.

The per rule tracking also uses this, so that the single big lock is no
longer a single point of contention.

Reimplement storage for flow thresholds to reuse as much logic as
possible from the host/ippair/rule thresholds.

Ticket: OISF#426.
Use the same hash key as for the regular threshold storage,
so include gid, rev, tentant id.
Implement new `type backoff` for thresholding. This allows alerts to be
limited.

A count of 1 with a multiplier of 10 would generate alerts for matching packets:
1, 10, 100, 1000, 10000, 100000, etc.

A count of 1 with a multiplier of 2 would generate alerts for matching packets:
1, 2, 4, 8, 16, 32, etc.

Like with other thresholds, rule actions like drop and setting of
flowbits will still be performed for each matching packet.

Current implementation is only for the by_flow tracker and for per rule
threshold statements.

Tracking is done using uint32_t. When it reaches this value, the rest of
the packets in the tracker will use the silent match.

Ticket: OISF#7120.
Enable backoff for most rules. The rules looking at the session start up
use a count of 1 and a multiplier of 2.

Post-3whs rules use a count of 1 and a multiplier of 10.
@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

field baseline test %
SURI_TLPW2_autofp_stats_chk
.uptime 137 142 103.65%

Pipeline 21323

@ct0br0
Copy link

ct0br0 commented Jul 1, 2024

I suspect the 10sec is a fluke. But we can rerun, probably after things catch up.

Copy link
Member

@inashivb inashivb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀
Merge blocked by: QA lab results

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 21334

@victorjulien victorjulien merged commit afc3187 into OISF:master Jul 2, 2024
105 checks passed
@victorjulien victorjulien deleted the next/508/20240628/v1 branch July 2, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants