Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bzip2: Add bzip2 to oss-fuzz #1887

Merged
merged 2 commits into from
Oct 19, 2018
Merged

Conversation

bshastry
Copy link
Contributor

@bshastry bshastry commented Oct 18, 2018

This PR integrates bzip2 into oss-fuzz (CC #402 ).

  • It contains two targets
    • comp/decompressor, and a standalone decompressor
    • should support all sanitizers
    • test harnesses out of source since there is no official bzip2 repo that seems to be maintained.

The decompressor target flags a wild read within seconds.

Copy link
Contributor

@jonathanmetzman jonathanmetzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We approve of adding this project. Thanks!
Really awesome that it has found a bug already.

Do you have jseward's approval for this?

I left some comments. The only blockers are the printing and the question I had about the safety of the hardcoded sizes given to malloc.
Everything else is optional to fix/improve (note that is what nit means).

{
int r;
unsigned int nZ, nOut;
char *zbuf = malloc(size + 600 + (size / 100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these hardcoded values? Are they guaranteed to always work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have been borrowed from here. I am not sure if it is always guaranteed to work, but this code is present in a test harness that is required to test the robustness of (de)compression.

@julian-seward1 what do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK. They just reflect the fact that the worst case output size is 101% of the input size + 600 bytes (I assume -- this is now nearly 20 years old). Since the buffer is in mallocville, presumably asan will complain if it gets overrun. I doubt that will happen though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I've sent a new PR #1894 in which I use the same buffer sizes (1 MB) as the original code. I think the size is more of an issue during decompression that I fixed now.


r = BZ2_bzBuffToBuffCompress(zbuf, &nZ, (char *)data, size, 9, 0, 30);
if (r != BZ_OK) {
fprintf(stdout, "Compression error: %d\n", r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really prefer not to have logging messages output by the fuzzer. Can you put this printf and others behind an env var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, WIP.

FROM gcr.io/oss-fuzz-base/base-builder
MAINTAINER bshas3@gmail.com
RUN apt-get update && apt-get install -y make autoconf automake libtool wget
RUN wget ftp://sources.redhat.com/pub/bzip2/v102/bzip2-1.0.2.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

".tar.gz"
This is kind of funny.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean you're testing 1.0.2 here? That is out of date. You should be testing 1.0.6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the official hosting repo for this version? Could you please link a URL here? Thanks.

r = BZ2_bzBuffToBuffCompress(zbuf, &nZ, (char *)data, size, 9, 0, 30);
if (r != BZ_OK) {
fprintf(stdout, "Compression error: %d\n", r);
goto cleanup;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: goto seems like overkill in this case but fine to keep IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, WIP

}
else {
assert(nOut == size);
assert(memcmp(data, outbuf, size) == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to test for correctness. I suspect it won't turn up anything but it would be fantastic if it did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

projects/bzip2/bzip2_compress_target.c Show resolved Hide resolved
unsigned int nZ, nOut;
char *zbuf = malloc(size + 600 + (size / 100));

r = BZ2_bzBuffToBuffCompress(zbuf, &nZ, (char *)data, size, 9, 0, 30);
Copy link
Contributor

@jonathanmetzman jonathanmetzman Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this doesn't have to come in this PR but could varying blockSize100k and workFactor increase coverage/bugs?
You can do that by dedicating input bytes (as discussed previously), or by reusing input bytes (such as by hashing and using that to seed a random number generator).
I'm actually unsure which way is better, dedicating bytes seems more correct but I don't think the fuzzer is smart enough to take advantage of this and it has the unfortunate side-effect of making the corpus not actually bz2 but X bytes + bz2 (I would probably reuse bytes, but this is just my best guess).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good point. I will think about it :-)

Thinking out loud here: I think the following results in fairly uniformly random workFactors

int hash = hash32(data, size, salt);
workFactor = hash%251; // max workFactor = 250

Because

  • hash may be assumed to be uniformly random
  • random modulo div X (prime) retains randomness?

We got lucky with 251 though. Even if we were to not have the upper bound (modulo) be a prime e.g., in the case of block size, the max block size is 9 (which means we need to choose mod 11 instead of mod 10). This means we may have (1/11) chance of block size being 10, but this is okay since bzip2 handles such misconfigurations gracefully?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this kind of stuff is more art than science so I could be wrong this:
I wouldn't worry about the options having a uniformly random distribution. I think as long as an option value isn't super unlikely, the odds are that the fuzzer will create a lot of testcases with that value.

This means we may have (1/11) chance of block size being 10, but this is okay since bzip2 handles such misconfigurations gracefully?

I would probably just do mod 10 even if some values are more likely to be selected than others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :-)

Will incorporate this in upcoming PR.


nOut = size*2;
char *outbuf = malloc(nOut);
r = BZ2_bzBuffToBuffDecompress(outbuf, &nOut, zbuf, nZ, 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe in later PR we could vary the small param?
(it might result in less bugs since I read that the algorithm used will be slower, but I suspect introducing a slow path, unless really bad could increase bugs found).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. Will address in future PR.


nOut = size*2;
char *outbuf = malloc(nOut);
r = BZ2_bzBuffToBuffDecompress(outbuf, &nOut, (char *)data, size, 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same comment about small param as in other file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, same as above.

char *outbuf = malloc(nOut);
r = BZ2_bzBuffToBuffDecompress(outbuf, &nOut, (char *)data, size, 0, 0);
if (r != BZ_OK) {
fprintf(stdout, "Decompression error: %d\n", r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about printf as in other file. Please don't do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, WIP.

@julian-seward1
Copy link

Good, good. Please let me know what you find.

@jonathanmetzman jonathanmetzman merged commit c6a08db into google:master Oct 19, 2018
Copy link
Contributor

@Dor1s Dor1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments for bzip2_decompress_target.c, the same applies for the compress fuzz target.

unsigned int nZ, nOut;

nOut = size*2;
char *outbuf = malloc(nOut);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd highly recommend to use a static buffer for this with some fixed size (e.g. 256 KB, see zlib example: https://github.com/google/oss-fuzz/blob/master/projects/zlib/zlib_uncompress_fuzzer.cc#L11). It should give a very significant speed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not.
a static buffer might be faster, but then we may lose some asan buffer overflow checking on it.

Also a question about nOut = size*2;
Does this mean that the decompression is often expected to return early due to end-of-output?

return 0;
}

assert(nOut == size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next lines are wrong, why would we expect the decompressed data to be the same as its compressed representation?

Copy link
Contributor Author

@bshastry bshastry Oct 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we comparing original fuzzed data with the compressed and subsequently decompressed data? I'd expect decompressed(compressed (data)==data. Same with size.

Oops, sorry. I thought we were talking about the compression target. I'll look into this and ping back.

@bshastry bshastry mentioned this pull request Oct 20, 2018
tmatth pushed a commit to tmatth/oss-fuzz that referenced this pull request Oct 22, 2018
@inferno-chromium
Copy link
Collaborator

bzip2 target bzip2_compress_target is having startup crash, please get the crash fixed asap upstream or disable ubsan config in project.yaml, this is causing exceptions due to no archived build for ubsan config.

Step #13: BAD BUILD: /workspace/out/undefined/bzip2_compress_target seems to have either startup crash or exit:
Step #13: INFO: Seed: 784997807
Step #13: INFO: Loaded 1 modules (2257 inline 8-bit counters): 2257 [0x763cf0, 0x7645c1),
Step #13: INFO: Loaded 1 PC tables (2257 PCs): 2257 [0x511de0,0x51aaf0),
Step #13: INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
Step #13: INFO: A corpus is not provided, starting from an empty corpus
Step #13: blocksort.c:302:7: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
Step #13: #0 0x47bee4 in fallbackSort /src/bzip2-1.0.2/blocksort.c:302:7
Step #13: #1 0x47b037 in BZ2_blockSort /src/bzip2-1.0.2/blocksort.c
Step #13: #2 0x46bb65 in BZ2_compressBlock /src/bzip2-1.0.2/compress.c:658:7
Step #13: #3 0x4665c0 in handle_compress /src/bzip2-1.0.2/bzlib.c
Step #13: #4 0x4661ae in BZ2_bzCompress /src/bzip2-1.0.2/bzlib.c:501:21
Step #13: #5 0x46a4bc in BZ2_bzBuffToBuffCompress /src/bzip2-1.0.2/bzlib.c:1300:10
Step #13: #6 0x430eed in LLVMFuzzerTestOneInput /src/bzip2_compress_target.c:47:9
Step #13: #7 0x440b28 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:570:15
Step #13: #8 0x43fef5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:479:3
Step #13: #9 0x4424a1 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&) /src/libfuzzer/FuzzerLoop.cpp:765:5
Step #13: #10 0x442895 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&) /src/libfuzzer/FuzzerLoop.cpp:805:3
Step #13: #11 0x435559 in fuzzer::FuzzerDriver(int*, char***, int ()(unsigned char const, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:764:6
Step #13: #12 0x4310a8 in main /src/libfuzzer/FuzzerMain.cpp:20:10
Step #13: #13 0x7ff519f8f82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
Step #13: #14 0x405df8 in _start (out/undefined/bzip2_compress_target+0x405df8)
Step #13:
Step #13: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior blocksort.c:302:7 in
Step #13: MS: 0 ; base unit: 0000000000000000000000000000000000000000
Step #13: 0xa,

@bshastry
Copy link
Contributor Author

Hi @inferno-chromium #1894 fixes this.

bshastry added a commit to bshastry/oss-fuzz that referenced this pull request Oct 24, 2018
inferno-chromium pushed a commit that referenced this pull request Oct 26, 2018
* bzip2: Bug fixes; added citations

* bzip2: Bump bzip2 version to 1.0.6

* bzip2: Variable blockSize100k, workFactor, and small

* bzip2: Remove hardcoded buffer sizes; fix nZ (compressor) to point to real buffer size

* bzip: Remove ubsan from project.yaml beacuse of start-up crash #1887
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants