Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from go-yara to yara-x; improve performance and readability #734

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

egibs
Copy link
Member

@egibs egibs commented Dec 22, 2024

Closes: #227
Closes: #497

After hacking around with yara-x locally last night, the performance gains over Yara are definitely noticeable (usually 2x faster at minimum). Previous versions seemed to be slower (at least 0.10.0) but my earlier testing may have been bugged/flawed. That said, I wanted to get this rather large refactor up to test the CI experience again since we have to build the API from scratch, but when refreshing the sample data locally I was down to about ~24-25 seconds with the integration tests taking about 27 seconds (on my M1 Pro MBP).

The main limitation with yara-x is that it does not expose any functionality to turn off problematic rules. To handle this, I added functionality to remove rules prior to compilation so that they are not evaluated by the scanner.

To help with CPU overhead, the PR also adds a pool of scanners that can be re-used across files. Previously, we always GC'd the active scanner after each scan which had a sizable impact, especially for scan paths containing many files.

This PR also cleans up recursiveScan and fixes the behavior of longestUnique and splits out path-related functions into a new path.go file. Additionally, processArchive will now concurrently scan extracted files which was previous serial. With this change, I can scan the OpenJDK package which extracts roughly 136,000 files in ~70-90 seconds.

Outside of the longestUnique changes, the final, rendered output is essentially 1:1 with the current implementation.

Edit: this PR is about 3-4x faster in GHA with 8-core runners (even when running in a container):

go test -timeout 0 ./tests/...
ok  	github.com/chainguard-dev/malcontent/tests	49.979s

This is usually around ~170s.

The tests and golangci-lint jobs now run in a Wolfi container to avoid compiling the yara-x C API each time the Workflows run; this more than halves the runtime of each job (5+ minutes down to ~2 minutes).

@egibs egibs added the do-not-merge This PR is not suitable for merging label Dec 22, 2024
@egibs egibs force-pushed the use-yara-x-take-2 branch 29 times, most recently from d4bf8dc to c7aaf0d Compare December 23, 2024 20:20
@egibs egibs force-pushed the use-yara-x-take-2 branch 3 times, most recently from e8041d8 to 659cd56 Compare December 23, 2024 20:31
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs force-pushed the use-yara-x-take-2 branch from 659cd56 to dcac602 Compare December 23, 2024 20:32
Signed-off-by: Evan Gibler <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs removed the do-not-merge This PR is not suitable for merging label Dec 24, 2024
@egibs egibs changed the title Swap over to yara-x; improve performance and readability Migrate from go-yara to yara-x; improve performance and readability Dec 24, 2024
@egibs egibs marked this pull request as ready for review December 24, 2024 21:16
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: Evan Gibler <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: Evan Gibler <20933572+egibs@users.noreply.github.com>
@egibs egibs requested a review from imjasonh January 3, 2025 13:11
egibs added 4 commits January 3, 2025 07:14
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

action: refactor recursiveScan Port malcontent to YARA-X
1 participant