-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bzip2: Add bzip2 to oss-fuzz #1887
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We approve of adding this project. Thanks!
Really awesome that it has found a bug already.
Do you have jseward's approval for this?
I left some comments. The only blockers are the printing and the question I had about the safety of the hardcoded sizes given to malloc
.
Everything else is optional to fix/improve (note that is what nit means).
{ | ||
int r; | ||
unsigned int nZ, nOut; | ||
char *zbuf = malloc(size + 600 + (size / 100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these hardcoded values? Are they guaranteed to always work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have been borrowed from here. I am not sure if it is always guaranteed to work, but this code is present in a test harness that is required to test the robustness of (de)compression.
@julian-seward1 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK. They just reflect the fact that the worst case output size is 101% of the input size + 600 bytes (I assume -- this is now nearly 20 years old). Since the buffer is in mallocville, presumably asan will complain if it gets overrun. I doubt that will happen though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I've sent a new PR #1894 in which I use the same buffer sizes (1 MB) as the original code. I think the size is more of an issue during decompression that I fixed now.
|
||
r = BZ2_bzBuffToBuffCompress(zbuf, &nZ, (char *)data, size, 9, 0, 30); | ||
if (r != BZ_OK) { | ||
fprintf(stdout, "Compression error: %d\n", r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really prefer not to have logging messages output by the fuzzer. Can you put this printf
and others behind an env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, WIP.
FROM gcr.io/oss-fuzz-base/base-builder | ||
MAINTAINER bshas3@gmail.com | ||
RUN apt-get update && apt-get install -y make autoconf automake libtool wget | ||
RUN wget ftp://sources.redhat.com/pub/bzip2/v102/bzip2-1.0.2.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
".tar.gz"
This is kind of funny.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean you're testing 1.0.2 here? That is out of date. You should be testing 1.0.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the official hosting repo for this version? Could you please link a URL here? Thanks.
r = BZ2_bzBuffToBuffCompress(zbuf, &nZ, (char *)data, size, 9, 0, 30); | ||
if (r != BZ_OK) { | ||
fprintf(stdout, "Compression error: %d\n", r); | ||
goto cleanup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: goto
seems like overkill in this case but fine to keep IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, WIP
} | ||
else { | ||
assert(nOut == size); | ||
assert(memcmp(data, outbuf, size) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to test for correctness. I suspect it won't turn up anything but it would be fantastic if it did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
unsigned int nZ, nOut; | ||
char *zbuf = malloc(size + 600 + (size / 100)); | ||
|
||
r = BZ2_bzBuffToBuffCompress(zbuf, &nZ, (char *)data, size, 9, 0, 30); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this doesn't have to come in this PR but could varying blockSize100k
and workFactor
increase coverage/bugs?
You can do that by dedicating input bytes (as discussed previously), or by reusing input bytes (such as by hashing and using that to seed a random number generator).
I'm actually unsure which way is better, dedicating bytes seems more correct but I don't think the fuzzer is smart enough to take advantage of this and it has the unfortunate side-effect of making the corpus not actually bz2 but X bytes + bz2 (I would probably reuse bytes, but this is just my best guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a good point. I will think about it :-)
Thinking out loud here: I think the following results in fairly uniformly random workFactors
int hash = hash32(data, size, salt);
workFactor = hash%251; // max workFactor = 250
Because
- hash may be assumed to be uniformly random
- random modulo div X (prime) retains randomness?
We got lucky with 251 though. Even if we were to not have the upper bound (modulo) be a prime e.g., in the case of block size, the max block size is 9 (which means we need to choose mod 11 instead of mod 10). This means we may have (1/11) chance of block size being 10, but this is okay since bzip2 handles such misconfigurations gracefully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this kind of stuff is more art than science so I could be wrong this:
I wouldn't worry about the options having a uniformly random distribution. I think as long as an option value isn't super unlikely, the odds are that the fuzzer will create a lot of testcases with that value.
This means we may have (1/11) chance of block size being 10, but this is okay since bzip2 handles such misconfigurations gracefully?
I would probably just do mod 10 even if some values are more likely to be selected than others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :-)
Will incorporate this in upcoming PR.
|
||
nOut = size*2; | ||
char *outbuf = malloc(nOut); | ||
r = BZ2_bzBuffToBuffDecompress(outbuf, &nOut, zbuf, nZ, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe in later PR we could vary the small
param?
(it might result in less bugs since I read that the algorithm used will be slower, but I suspect introducing a slow path, unless really bad could increase bugs found).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. Will address in future PR.
|
||
nOut = size*2; | ||
char *outbuf = malloc(nOut); | ||
r = BZ2_bzBuffToBuffDecompress(outbuf, &nOut, (char *)data, size, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same comment about small
param as in other file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, same as above.
char *outbuf = malloc(nOut); | ||
r = BZ2_bzBuffToBuffDecompress(outbuf, &nOut, (char *)data, size, 0, 0); | ||
if (r != BZ_OK) { | ||
fprintf(stdout, "Decompression error: %d\n", r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about printf as in other file. Please don't do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, WIP.
Good, good. Please let me know what you find. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments for bzip2_decompress_target.c, the same applies for the compress fuzz target.
unsigned int nZ, nOut; | ||
|
||
nOut = size*2; | ||
char *outbuf = malloc(nOut); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd highly recommend to use a static buffer for this with some fixed size (e.g. 256 KB, see zlib example: https://github.com/google/oss-fuzz/blob/master/projects/zlib/zlib_uncompress_fuzzer.cc#L11). It should give a very significant speed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not.
a static buffer might be faster, but then we may lose some asan buffer overflow checking on it.
Also a question about nOut = size*2;
Does this mean that the decompression is often expected to return early due to end-of-output?
return 0; | ||
} | ||
|
||
assert(nOut == size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the next lines are wrong, why would we expect the decompressed data to be the same as its compressed representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we comparing original fuzzed data with the compressed and subsequently decompressed data? I'd expect decompressed(compressed (data)==data. Same with size.
Oops, sorry. I thought we were talking about the compression target. I'll look into this and ping back.
bzip2 target bzip2_compress_target is having startup crash, please get the crash fixed asap upstream or disable ubsan config in project.yaml, this is causing exceptions due to no archived build for ubsan config. Step #13: BAD BUILD: /workspace/out/undefined/bzip2_compress_target seems to have either startup crash or exit: |
Hi @inferno-chromium #1894 fixes this. |
* bzip2: Bug fixes; added citations * bzip2: Bump bzip2 version to 1.0.6 * bzip2: Variable blockSize100k, workFactor, and small * bzip2: Remove hardcoded buffer sizes; fix nZ (compressor) to point to real buffer size * bzip: Remove ubsan from project.yaml beacuse of start-up crash #1887
This PR integrates bzip2 into oss-fuzz (CC #402 ).
The decompressor target flags a wild read within seconds.