Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from go-yara to yara-x; improve performance and readability #734

Merged
merged 30 commits into from
Jan 13, 2025

Conversation

egibs
Copy link
Member

@egibs egibs commented Dec 22, 2024

Closes: #227
Closes: #497

After hacking around with yara-x locally last night, the performance gains over Yara are definitely noticeable (usually 2x faster at minimum). Previous versions seemed to be slower (at least 0.10.0) but my earlier testing may have been bugged/flawed. That said, I wanted to get this rather large refactor up to test the CI experience again since we have to build the API from scratch, but when refreshing the sample data locally I was down to about ~24-25 seconds with the integration tests taking about 27 seconds (on my M1 Pro MBP).

The main limitation with yara-x is that it does not expose any functionality to turn off problematic rules. To handle this, I added functionality to remove rules prior to compilation so that they are not evaluated by the scanner.

To help with CPU overhead, the PR also adds a pool of scanners that can be re-used across files. Previously, we always GC'd the active scanner after each scan which had a sizable impact, especially for scan paths containing many files.

This PR also cleans up recursiveScan and fixes the behavior of longestUnique and splits out path-related functions into a new path.go file. Additionally, processArchive will now concurrently scan extracted files which was previous serial. With this change, I can scan the OpenJDK package which extracts roughly 136,000 files in ~70-90 seconds.

Outside of the longestUnique changes, the final, rendered output is essentially 1:1 with the current implementation.

Edit: this PR is about 3-4x faster in GHA with 8-core runners (even when running in a container):

go test -timeout 0 ./tests/...
ok  	github.com/chainguard-dev/malcontent/tests	49.979s

This is usually around ~170s.

The tests and golangci-lint jobs now run in a Wolfi container to avoid compiling the yara-x C API each time the Workflows run; this more than halves the runtime of each job (5+ minutes down to ~2 minutes).

@egibs egibs added the do-not-merge This PR is not suitable for merging label Dec 22, 2024
@egibs egibs force-pushed the use-yara-x-take-2 branch 29 times, most recently from d4bf8dc to c7aaf0d Compare December 23, 2024 20:20
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: Evan Gibler <20933572+egibs@users.noreply.github.com>
@egibs egibs requested a review from imjasonh January 3, 2025 13:11
egibs added 9 commits January 3, 2025 07:14
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: Evan Gibler <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@@ -17,7 +17,21 @@ permissions:
jobs:
test:
runs-on: mal-ubuntu-latest-8-core
container: cgr.dev/chainguard/wolfi-base:latest
container:
image: cgr.dev/chainguard/wolfi-base@sha256:eeb70e74e2ac07d3c80a30150bf473970c8b51a57f06daef3e4d065ac52489bc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to wait until CI was running consistently before locking everything down.

egibs added 3 commits January 13, 2025 10:32
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
README.md Outdated
* [pkgconf](http://pkgconf.org/) - required by Go to find C dependencies, included in many UNIX distributions

Linux or macOS users can run the following command to install the necessary dependencies, other than Go:
`yara-x` requires an underlying C API to function. For Wolfi users, this can be installed by running `apk add yara-x-compat`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aside feels unhelpful as few if any users will install this onto Wolfi from source code. Can we provide a macOS example instead?

README.md Outdated
cargo cinstall -p yara-x-capi --release --prefix=/tmp/yara-x
```

And then copy the resulting files from `/tmp/yara-x` to `/usr/local/lib` and `/usr/local/include` (or the preferred `PKG_CONFIG_PATH`/`LD_LIBRARY_PATH` location(s)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels kind of gross. Is there a separate build/install step in cargo? It'd be nice if we could tell folks to use sudo cinstall ... --prefix=/usr/local

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can test that out. IIRC sudo failed because it was looking for a root installation of cargo/cinstall.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 0270283 (#734).

I tested the sudo command and it worked without any non-intuitive copying.

Copy link
Collaborator

@tstromberg tstromberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for code review, but the install instructions could use some more love.

egibs added 3 commits January 13, 2025 17:21
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs merged commit e2bf9b6 into chainguard-dev:main Jan 13, 2025
9 checks passed
@egibs egibs deleted the use-yara-x-take-2 branch January 17, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

action: refactor recursiveScan Port malcontent to YARA-X
2 participants