Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzz testing integration #4129

Merged
merged 15 commits into from
Oct 21, 2020
Merged

Fuzz testing integration #4129

merged 15 commits into from
Oct 21, 2020

Conversation

darosior
Copy link
Contributor

@darosior darosior commented Oct 15, 2020

This introduces coverage-guided fuzz testing using libfuzzer.

Why

We've been talking about it for some time now, and i'm sure it could help us to uncover trivial bugs with minimal overhead, more sneaky ones more easily than with unit tests, and finally to avoid regressions by maintaining a seed corpus to ran patches against.

Going further, structure-aware fuzzing could help us with Bitcoin Script creation or even with the state machine which while more intrusive because crypto seems plausible because of the daemons separation and messages.

For now, this PR only introduces the basis for integration with primitives and rudimentary targets (i focused on common/). Still, this found some bugs (eg amounts string parsing / formatting) and new targets could help with e.g. our JSON parser or the gossip store.

Integration

This extends our tests/ directory by moving the Python functional tests to the tests/functional/ subdirectory (sorry for the conflicts..) and creates a new sub: tests/fuzz/.

Inspired by the unit tests, all targets are named fuzz-[target_name].
Not inspired by the unit tests, all targets are built in tests/fuzz/. Even though there are only common/-specific targets for now, i think it makes sense to keep them contained in a single directory.

A libfuzz.h abstracts out the interface to libFuzzer for potential future support for other fuzzers, and run.py allows to manage them in batches. This was inspired by the integration in bitcoin-core but i guess it's a pretty common pattern.

What next

  • a)
    I propose to create a new repository (eg in https://github.com/lightningd/ ?) for tracking the seed corpus that we could update if we generate new coverage locally.
    This could be pulled in a new CI job that checks PRs against each [target + seeds] once to check for regression. What do you think about using Github Action for this ?
  • b)
    Write more targets
  • c)
    Generate more coverage
  • d)
    Go to b) :)

@darosior darosior requested a review from cdecker as a code owner October 15, 2020 10:42
@darosior darosior force-pushed the fuzzing branch 2 times, most recently from ffa42e6 to 9fe50b9 Compare October 15, 2020 13:56
Copy link
Contributor

@niftynei niftynei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Bunch of small nits. I think my biggest complaint is that there's no real reason to move all the existing CI tests down a directory. There's no reason we can't have tests/fuzzing exist while also leaving all the other tests where they are.

Didn't really give the Makefiles a good looksee, as that's not really my particular wheelhouse.

common/amount.c Show resolved Hide resolved
common/amount.c Show resolved Hide resolved
doc/FUZZING.md Outdated Show resolved Hide resolved
doc/FUZZING.md Outdated Show resolved Hide resolved
doc/FUZZING.md Show resolved Hide resolved
tests/fuzz/run.py Show resolved Hide resolved
@@ -7,12 +7,7 @@ FUZZ_TARGETS_SRC := $(wildcard tests/fuzz/fuzz-*.c)
FUZZ_TARGETS_OBJS := $(FUZZ_TARGETS_SRC:.c=.o)
FUZZ_TARGETS_BIN := $(FUZZ_TARGETS_SRC:.c=)

FUZZ_COMMON_OBJS := \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like it belongs in the previous commit

common/test/run-amount.c Show resolved Hide resolved
tests/fuzz/libfuzz.c Outdated Show resolved Hide resolved
tests/fuzz/fuzz-initial_channel.c Show resolved Hide resolved
@niftynei niftynei added the clightning_twit Tag to nudge @niftynei to post to @clightning_twit label Oct 15, 2020
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR @darosior, I've long wanted to look into fuzzing, and with this I have almost a guided tour to help me learn. Haven't looked too closely yet, but I'll grab a coco and start learning asap.

However, I share @niftynei's feeling re moving all tests down one directory level:

  1. It breaks almost all PRs
  2. Those are likely the most edited files in the whole repo, hiding them in ya directory means some more typing 😜

.gitignore Outdated Show resolved Hide resolved
@darosior
Copy link
Contributor Author

Ok, will remove the move!

This adds a new configuration, --enable-fuzzing (which is more than
welcome to be coupled with --enable-address-sanitizer), to pass the
fuzzer sanitizer argument when compiling objects. This allows libfuzzer
to actually be able "to fuzz" by detecting coverage and be smart when
mutating inputs.

As libfuzzer brings its own ~~fees~~ main(), we compile objects with
fsanitize=fuzzer-no-link, and special-case the linkage of the fuzz
targets.

A "lib" is added to abstract out the interface to the fuzzing tool used.
This allow us to use the same targets to fuzz using AFL, hongfuzz or w/e
by adding their entrypoints into libfuzz. (h/t to practicalswift who
introduced this for bitcoin-core, which i mimiced)

Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
@darosior darosior force-pushed the fuzzing branch 3 times, most recently from 4af47e6 to 442099e Compare October 16, 2020 09:36
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor performance tips :-)

tests/fuzz/fuzz-amount.c Outdated Show resolved Hide resolved
tests/fuzz/libfuzz.c Outdated Show resolved Hide resolved
tests/fuzz/libfuzz.c Outdated Show resolved Hide resolved
tests/fuzz/fuzz-initial_channel.c Outdated Show resolved Hide resolved
tests/fuzz/run.py Outdated Show resolved Hide resolved
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
This:
	- Allows `.*btc` amounts (without post-decimal)
	- Avoids creating decimals when amount is 0 btc
	- Corrects our handling of the suffixes (memeqstr would
	  sometimes return false because of null-termination)

Changelog-Fixed: We are now able to parse any amount string (XXXmsat, XX.XXXbtc, ..) we create.

Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
We don't use it, and it's buggy (will always return NULL)

Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
My bad, i comitted them as part of the LSAN suppressions while this data
race could have and had been fixed.

Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
@darosior
Copy link
Contributor Author

Thanks for the reviews so far! Amended as per latest one.

(Will check about the process pool soon ™️)

Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
@darosior
Copy link
Contributor Author

What do you think about having a repo in lightningd/ to store corpora ?

@cdecker
Copy link
Member

cdecker commented Oct 20, 2020

What do you think about having a repo in lightningd/ to store corpora ?

Sounds like something we could consider. From what I've seen so far the corpus is not going to be huge (<= 1GB by my estimate) so that can work. I wonder however if the version control brings anything new to the table, given that the corpus consists mostly of binary files, so maybe a linear storage model can work as well? The big advantage of using git + github is the reuse of infra and processes to collaborate on building up a corpus.

(edit: what I'm saying is that I would have no idea on how to judge a pull request xD)

@darosior
Copy link
Contributor Author

darosior commented Oct 20, 2020

Yeah, really just thinking of copying the workflow at github.com/bitcoin-core/qa-assets :-)

EDIT:

(edit: what I'm saying is that I would have no idea on how to judge a pull request xD)

:) I'd just ask (trust?) the contributor to have ran --merge on the seed dir. Could also be checked locally (wonder it's deterministic hmm). In any case i don't expect a lot of contributions there (as in bitcoind's qa-assets).

(<= 1GB by my estimate)

Been running one for almost a week it's 8.3M atm. But we really only have few simple targets.

I wonder however if the version control brings anything new to the table

A place which we can easily contribute to, but you still have control over since it's pulled in your CI. I'd say that the platform, Github, is more of interest than the version control tool, git, for this one :)

@jsarenik
Copy link
Contributor

jsarenik commented Oct 20, 2020

@darosior that qa-assets repo looks ugly to me. Exactly because of what @cdecker wrote earlier.

Here is my offer: If you agree to put the binary data out of git, i will happily help with any and all automation required (which will be itself properly kept in a git repository; e.g. building the corpuses, uploading them somewhere (or sharing them somehow), etc.).

Yes, i learned a bit about Torrent in last weeks and am strongly biased against any binary blob in repositories designed for source code.

https://git-annex.branchable.com/ or https://git-lfs.github.com/ can be used to track the binary blobs.

EDIT: This offer is valid also for serving the binary data over BitTorrent. No cloud needed.

@jsarenik
Copy link
Contributor

Just for reference here is one repository that uses binary fuzz files, even inside its main repository: https://github.com/rust-bitcoin/rust-bitcoin/tree/master/fuzz. The proposed separate repository (like they have in https://github.com/bitcoin-core/qa-assets) is a better idea compared to in-tree blobs.

@cdecker
Copy link
Member

cdecker commented Oct 21, 2020

@darosior that qa-assets repo looks ugly to me. Exactly because of what @cdecker wrote earlier.

Here is my offer: If you agree to put the binary data out of git, i will happily help with any and all automation required (which will be itself properly kept in a git repository; e.g. building the corpuses, uploading them somewhere (or sharing them somehow), etc.).

Yes, i learned a bit about Torrent in last weeks and am strongly biased against any binary blob in repositories designed for source code.

https://git-annex.branchable.com/ or https://git-lfs.github.com/ can be used to track the binary blobs.

EDIT: This offer is valid also for serving the binary data over BitTorrent. No cloud needed.

I think the main issue is collecting and managing the corpus, the serving is secondary (happy to host the corpus on GCS for a couple of cents each month). For the code review and addition we have a really good process, that's no longer true if we start managing things outside the git tree.

I absolutely share your dislike of binary artifacts in git, however it might be easiest to get started with an in-tree solution, and move that out at a later point in time. Or the separate corpus repo could be a cleaner solution that doesn't taint the main repo with these binaries.

Going a bit back and forth between the solutions tbh.

@darosior
Copy link
Contributor Author

Yeah, management on a server would be awful, discussions around it would be mixed up. I can already imagine having to upload my new corpus to (a VPS / a torrent seeder / whatever) and bug cdecker on IRC for him to pull it and eventually merge it with the existing one on its GCS for it to be pulled by our CI... Wow. Let's just use Github?

@cdecker
Copy link
Member

cdecker commented Oct 21, 2020

Ok, let's use a corpus repo then on the lightningd org. That unblocks this PR which I'd like to merge, and keeps our main tree clean, while allowing us to come up with a better solution at a later point in time.

@cdecker
Copy link
Member

cdecker commented Oct 21, 2020

ACK acef7ea

@jsarenik
Copy link
Contributor

jsarenik commented Oct 21, 2020

ACK acef7ea (have learned a lot about Fuzzing in the process, thank you!)

Recommended reading (because the word fuzzing is quite fuzzy until the real meaning of it is seen):
https://en.wikipedia.org/wiki/Fuzzing

@cdecker cdecker merged commit 49dcb90 into ElementsProject:master Oct 21, 2020
@darosior
Copy link
Contributor Author

@rustyrussell could you look at the amount parsing commit ? iirc i stumbled upon something like that in the past and you corrected my fix so just would like your input on this one.

@jsarenik
Copy link
Contributor

jsarenik commented Oct 25, 2020

@darosior This PR has been merged. Is your comment relevant or is it here by mistake? Any link to that commit you mention?

@niftynei niftynei removed the clightning_twit Tag to nudge @niftynei to post to @clightning_twit label Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants