-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less basic fuzzer #21246
base: master
Are you sure you want to change the base?
Less basic fuzzer #21246
Conversation
Exciting! This is great timing. I was about to embark on a similar branch, so instead I will cooperate with your efforts here. Can you share your testing methodology? |
/// maximum 2GiB of input data should be enough. 32th bit is delete flag | ||
pub const Index = u31; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// maximum 2GiB of input data should be enough. 32th bit is delete flag | |
pub const Index = u31; | |
pub const Flags = packed struct (u32) { | |
index: Index, | |
delete: bool, | |
pub const Index = enum(u31) { | |
/// This tag is used below in the line with | |
/// `MemoryMappedList(u8).init(buffer_file, std.math.maxInt(Index))` | |
/// but I don't see what the special value means. Replace these doc comments | |
/// and field name with a more descriptive one. | |
/// Or, if you did not mean for this to be a special value, delete this field and | |
/// initialize `index` with `undefined`. | |
special_value = std.math.maxInt(u31), | |
_, | |
}; | |
pub const empty: Flags = .{ | |
.index = .special_value, | |
.delete = true, | |
}; | |
}; |
I promise you this type safety will come in handy.
Then you can elsewhere do this:
- const buffer = MemoryMappedList(u8).init(buffer_file, std.math.maxInt(Index));
+ const buffer = MemoryMappedList(u8).init(buffer_file, Flags.empty);
Instead of deleteMask
, use flags.delete
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special_value = std.math.maxInt(u31),
That is not a special value! :D I actually want to allocate 2GiB of virtual memory. It simplifies the problem of reallocating our mmaped file when some other process appended to the file.
This way all the process needs to do is read len
(start of meta file) and access the already allocated buffer. the process that incremented len
already called ftruncate on the file so all other processes can access their allocated buffer without doing anything at all.
Otherwise there would need to be some logic for other processes to store the allocation size and mremap it when grows too much and such and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of deleteMask, use flags.delete etc
+1
ip.meta.deinit(); | ||
} | ||
|
||
// Primitive spin lock implementation. There is basically no contention on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std.Thread.Mutex
instead of implementing a spin lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that was that std.Thread.Mutex
calls futex_wait
with PRIVATE_FLAG
, which signals to the kernel that this futex is not shared with other processes, which is not the case probably. Depends on how we are going to implement paralell fuzzing. I wanted to ask you for input on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the build system might spawn multiple separate fuzz processes and in that case, this lock would be inter-process and the PRIVATE_FLAG
would be problematic. It would not be a problem if the fuzzer itself just spawns a couple of threads in a thread pool.
I dont know that is the policy of spawning threads manually vs letting the build system handle parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be better to let the build system handle parallelism because:
- I didn't get std.Thread.Pool: process tree cooperation #20274 working in my most recent effort and I lack the motivation to work on that problem currently because it involves sad workarounds for a disappointing lack of primitives offered by Unix-like operating systems. Since the thread pool is in the build runner process, the fuzzer process does not know how many threads it can spawn.
- Eventually, we want the ability to coordinate multiple build runners fuzzing across multiple different machines. This means coordinating seeds and inputs at that higher abstraction layer.
If we need to add an inter-process mutex to the standard library, that's fine, let's do it.
However there is also the lock-free approach. I haven't closely inspected what the mutex in this PR is being used for yet. Maybe locking could be entirely avoided.
In any case, I am vetoing the use of a spinlock.
When I ran this locally, it seemed to find a test failure rather quickly:
However, when I used the dump_corpus tool to extract strings --- a/lib/fuzzer/dump_corpus.zig
+++ b/lib/fuzzer/dump_corpus.zig
@@ -39,6 +39,6 @@ pub fn main() void {
// volatile was trying to achieve in the first place
const str2: []const u8 = @volatileCast(str);
- std.log.info("\"{s}\"", .{str2});
+ std.log.info("\"{}\"", .{std.zig.fmtEscapes(str2)});
}
} --- a/lib/std/zig/tokenizer.zig
+++ b/lib/std/zig/tokenizer.zig
@@ -1841,10 +1841,7 @@ fn testTokenize(source: [:0]const u8, expected_token_tags: []const Token.Tag) !v
try std.testing.expectEqual(source.len, last_token.loc.end);
}
-test "fuzzable properties upheld" {
- const source = std.testing.fuzzInput(.{});
- const source0 = try std.testing.allocator.dupeZ(u8, source);
- defer std.testing.allocator.free(source0);
+fn testProperties(source0: [:0]const u8) anyerror!void {
var tokenizer = Tokenizer.init(source0);
var tokenization_failed = false;
while (true) {
@@ -1885,3 +1882,339 @@ test "fuzzable properties upheld" {
}
};
}
+
+test "fuzzable properties upheld" {
+ const source = std.testing.fuzzInput(.{});
+ const source0 = try std.testing.allocator.dupeZ(u8, source);
+ defer std.testing.allocator.free(source0);
+ return testProperties(source0);
+}
+
+test "fuzzable properties upheld - corpus" {
+ for (corpus) |one| {
+ try testProperties(one);
+ }
+}
+
+const corpus = [_][:0]const u8{ the crashing input didn't seem to be present:
|
haha that is because it just calls exit when a test fails. I totally forgot to report found bad inputs :D |
I am ashamed to admit that my testing methodology was to just run it on my toy bencode parser and see how quickly the coverage number goes up. This PR is more focused on setting up a working fuzzer-shaped project and less focused on tuning the fuzzer to get better results. |
Ah yes, that is intentional, because another way for it to find a bad input is to segfault, abort, or crash in another unexpected manner. So that's why I thought it should write the input before running the user's test code. This way, a test failure and a crashing process look the same to the fuzzer.
No shame in this! We're both starting with toy examples and working our way up. |
writing every input into the corpus before evaluating it doesn't work since deleting a string from the corpus in the current design is expensive and requires exclusive access to the entire corpus. It should probably write the bad input from a panic handler (not into the corpus, just into some file) and somehow recover and keep fuzzing. The only way i can see how to recover from a panic handler back into the fuzzer is using longjump but that might be controversial. Maybe we can just start the fuzzer again. Crashing inputs should be rare.. |
Thank you for the feedback! I appreciate it. I am currently unfortunately overwhelmed with other work so I'll get back to this in approximately 3 weeks. I hope that is OK. I look forward to getting this merged. |
b1f3351
to
a32bf7c
Compare
Welcome back. I have some local changes I expect to push today and you'll need to rebase on them. I did things a little differently than you. |
Rationale being that max u64 in base 10 is 20 chars long so we want a chance to insert ascii number over the u64 range and crash parsers that don't expect parsing an ascii number can overflow usize.
This reverts commit 12e25db.
Co-authored-by: Andrew Kelley <andrew@ziglang.org>
By local changes you mean that the fuzzer now gets a function pointer instead of being called from the test or is there something more? I like this change btw. I'm aaking because I rebased on top of master and didn't get any conflicts. |
a32bf7c
to
bc18acc
Compare
This PR improves the current fuzzer implementation. It is not yet competetive with other fuzzers but it is working and ready for a first round of review (which I assume is better than to drop a huge PR later). I plan to keep improving the fuzzer in the near future.
The fuzzer stores a corpus of (once) interesting inputs inside a pair of memory mapped files, ready to be shared with other fuzzing processes (which there are none currently since I am not sure where is the best place to spawn them).
It randomly picks a input, mutates it, sees if it hits any new features in the instrumented program and if so, adds it to the corpus. The mutations are taken from llvm's libfuzzer.
My next plan is to:
closes #20814
closes #20803
wip on #20804