-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrated fuzz testing #20702
Comments
Note that fuzz testing benefits a lot from starting with an input corpus of short, unique, and relevant inputs: So Zig will likely want to have a way to provide the build/test runner with seed inputs as well. Imaginary syntax: test "foo" {
std.testing.fuzzCorpus(&.{
@embedFile("inputs/input01"),
@embedFile("inputs/input02"),
});
const input_bytes = std.testing.fuzzInput();
try std.testing.expect(!std.mem.eql(u8, "canyoufindme", input_bytes));
} But this may not be ideal since it's part of the test code itself. Separating things out into some sort of separate "setup the fuzzer" + "provide a function to repeatedly call" might be worthwhile. (side note: dictionaries can also be helpful and would similarly ideally be provided in some sort of setup phase) FWIW here's what Go's integrated fuzz testing looks like: func FuzzReverse(f *testing.F) {
testcases := []string{"Hello, world", " ", "!12345"}
for _, tc := range testcases {
f.Add(tc) // Use f.Add to provide a seed corpus
}
f.Fuzz(func(t *testing.T, orig string) {
// ...
})
} |
How does this mechanism work? If you've not thought about this yet, as a random (possibly bad) idea: perhaps |
example of a repo that does it today (with afl integration) (with separate |
The build runner already runs the test runner as a child process with the test runner protocol over stdio, so that it can keep running unit tests when one of them crashes the process, and check that a unit test triggered a safety panic as expected (#1356). It also makes the parent process know which test was being executed if the unit test crashes the process. Doing this over stdio is super handy because it even works in strange environments such as via QEMU, wine, or wasmtime. The function can set a flag indicating that a fuzz test was encountered, then return random bytes (smoke test). Before the test runner sends EOF to the parent process it will send a message indicating metadata about the fuzz tests in the compilation. The build runner then has all the information it needs to enter Fuzz Mode after the main build pipeline is done. |
That makes sense -- nicely designed. Here's a tangentially related question. Like other parts of the compiler, our testing infrastructure is moving towards a strong bias to running via the build system. Is there, perhaps, an argument to be made for renaming This fuzzing stuff is another example of very tight integration between the build system and compiler, where directly running Perhaps this is a silly idea; but if you think it has some merit, I'll spin it off into a separate proposal. |
This would also apply to The fuzz tests in To answer your question about Edit: now that I think about it, I don't think it would be that hard to make |
IMO the ideal would be that in However, I can't really think of a way to make defining an input corpus work with Zig's current test syntax, so a proof-of-concept that always fuzzes starting with an empty input is probably the way to go. |
Fuzzing often is done in a distributed manner: ten machines simultaneously running the fuzzer. To enable these kind of use-cases, it would be useful to access the results from the build system. Eg, fuzz step could produce a report in JSON file, which you then can use as an input to “CreateGitHubIssueStep” or some such. |
Here's a half-baked idea that maybe somebody could turn into something workable: have a mechanism to ensure that fuzzing hits a certain line of code and that shows a failure otherwise. |
* Add the `-ffuzz` and `-fno-fuzz` CLI arguments. * Detect fuzz testing flags from zig cc. * Set the correct clang flags when fuzz testing is requested. It can be combined with TSAN and UBSAN. * Compilation: build fuzzer library when needed which is currently an empty zig file. * Add optforfuzzing to every function in the llvm backend for modules that have requested fuzzing. * In ZigLLVMTargetMachineEmitToFile, add the optimization passes for sanitizer coverage. * std.mem.eql uses a naive implementation optimized for fuzzing when builtin.fuzz is true. Tracked by #20702
Sounds related to sometimes assertions. |
* Add the `-ffuzz` and `-fno-fuzz` CLI arguments. * Detect fuzz testing flags from zig cc. * Set the correct clang flags when fuzz testing is requested. It can be combined with TSAN and UBSAN. * Compilation: build fuzzer library when needed which is currently an empty zig file. * Add optforfuzzing to every function in the llvm backend for modules that have requested fuzzing. * In ZigLLVMTargetMachineEmitToFile, add the optimization passes for sanitizer coverage. * std.mem.eql uses a naive implementation optimized for fuzzing when builtin.fuzz is true. Tracked by #20702
Instead of specifying a corpus in Zig code, what about providing it to the test/build runner on the CLI? Could we have When fuzzing with AFLPlusPlus I often have updated my corpus with new seed files from a previous fuzzing run so that the next run doesn't have to re-explore the same search space from scratch. For this reason, I think it would make more sense for the input corpus to not be specified in the code. With a CLI flag, the build-runner could even be made to automatically update the corpus with new seeds if desired. |
* Add the `-ffuzz` and `-fno-fuzz` CLI arguments. * Detect fuzz testing flags from zig cc. * Set the correct clang flags when fuzz testing is requested. It can be combined with TSAN and UBSAN. * Compilation: build fuzzer library when needed which is currently an empty zig file. * Add optforfuzzing to every function in the llvm backend for modules that have requested fuzzing. * In ZigLLVMTargetMachineEmitToFile, add the optimization passes for sanitizer coverage. * std.mem.eql uses a naive implementation optimized for fuzzing when builtin.fuzz is true. Tracked by #20702
Depends what the intended use cases are. From the OP, it sounds like running multiple fuzz tests (for a finite amount of time each) is an intended use case, so specifying a corpus for each fuzz test via the CLI might be a bit tricky. Reading from some particular location based on the fully qualified test name would work but would make renaming/moving tests around a chore (and a potential footgun-of-sorts if you don't realize there's a mismatch in the corpus/test FQN). |
Existing languages have a lot of magic re: how fuzzing targets are defined. For example, Go requires targets to:
Fuzzing in Rust via cargo-fuzz is better in that:
This is how the IMO I believe we could get the best of both worlds by having a |
With the plan of a two-pass system where the first pass detects which tests are fuzz tests, perhaps we can have a |
* Add the `-ffuzz` and `-fno-fuzz` CLI arguments. * Detect fuzz testing flags from zig cc. * Set the correct clang flags when fuzz testing is requested. It can be combined with TSAN and UBSAN. * Compilation: build fuzzer library when needed which is currently an empty zig file. * Add optforfuzzing to every function in the llvm backend for modules that have requested fuzzing. * In ZigLLVMTargetMachineEmitToFile, add the optimization passes for sanitizer coverage. * std.mem.eql uses a naive implementation optimized for fuzzing when builtin.fuzz is true. Tracked by #20702
For those who know more about fuzzers and instrumentation: how hard would it be to make this generic enough and make integrations into different instrumentation/fuzzing libraries? Letting you plug fizzing engines or, for example, if I was making a zig library for a different language and they used a specific fuzzer and I wanted to fuzz the calls to zig using the same system (getting coverage etc.). Almost like having "custom fuzz runners + integration" the same way we can have custom build and test runners? |
Non-deterministic CI failures ahoy! After fuzzing in a lot of different projects, I like the interface in go. In test mode just run the provided inputs and in fuzz mode use those inputs to seed the corpus. Many fuzzing tools also have a corpus minimization option which produces the minimum set of inputs that obtain the same coverage as the full corpus. I like to copy those back into the fuzz test to get good coverage in test mode. |
|
Some minor quality-of-life options from other tools:
|
As someone with a lot of experience in fuzzing in Go, I can't stress enough how important it is. Without them, continuous fuzzing is essentially broken in Go: golang/go#48157, golang/go#56238, golang/go#52569 |
We do something at TigerBeetle here. I am not sure if what we do is brilliant or cursed. What we do is that we use commit sha as a seed for "run fuzz tests once on CI" check:
|
* Add the `-ffuzz` and `-fno-fuzz` CLI arguments. * Detect fuzz testing flags from zig cc. * Set the correct clang flags when fuzz testing is requested. It can be combined with TSAN and UBSAN. * Compilation: build fuzzer library when needed which is currently an empty zig file. * Add optforfuzzing to every function in the llvm backend for modules that have requested fuzzing. * In ZigLLVMTargetMachineEmitToFile, add the optimization passes for sanitizer coverage. * std.mem.eql uses a naive implementation optimized for fuzzing when builtin.fuzz is true. Tracked by ziglang#20702
* Add the `-ffuzz` and `-fno-fuzz` CLI arguments. * Detect fuzz testing flags from zig cc. * Set the correct clang flags when fuzz testing is requested. It can be combined with TSAN and UBSAN. * Compilation: build fuzzer library when needed which is currently an empty zig file. * Add optforfuzzing to every function in the llvm backend for modules that have requested fuzzing. * In ZigLLVMTargetMachineEmitToFile, add the optimization passes for sanitizer coverage. * std.mem.eql uses a naive implementation optimized for fuzzing when builtin.fuzz is true. Tracked by ziglang#20702
Make it so that unit tests can ask for fuzz input:
Introduce flags to the compiler:
-ffuzz
,-fno-fuzz
. These end up passing-fsanitize=fuzzer-no-link
to Clang for C/C++ files. Introduce build system equivalent API.However, neither the CLI interface nor the build system interface is needed in order to enable fuzzing. The only thing that is needed is to ask for fuzz input in unit tests, as in the above example.
When the build runner interacts with the test runner, it learns which tests, if any, are fuzz tests. Then when unit tests pass, it moves on to fuzz testing, by providing our own implementation of the genetic algorithms that drive the input bytes (similar to libFuzzer or AFL), and re-compiling the unit test binary with
-ffuzz
enabled.Fuzz testing is level-driven so we will need some CLI to operate those options. For example,
zig build --fuzz
might start fuzzing indefinitely, whilezig build --fuzz=300s
declares success after fuzzing for five minutes. When fuzz testing is not requested, it defaults to a small number of iterations just to smoke test that it's all working.Some sort of UI would be nice. For starters this could just be
std.Progress
. In the future perhaps there could be a live-updating HTML page to visualize progress and code coverage in realtime. How cool would it be to watch source code turn from red to green live as the fuzzer finds new branches?I think there's value in being able to fuzz test a mix of Zig and C/C++ source code, so let's start with evaluating LLVM's instrumentation and perhaps being compatible with it, or at least supporting it. First step is to implement the support library in Zig.
-ffuzz
will be made available as a comptime flag in@import("builtin")
so that it can be used, for example, to choose the naive implementation ofstd.mem.eql
which helps the fuzzer to find interesting branches.Comments are welcome. Note this is an enhancement not a proposal. The question is not "whether?" but "how?".
Related:
The text was updated successfully, but these errors were encountered: