Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chapter on fuzzing #1646

Merged
merged 7 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
- [Stabilizing Features](./stabilization_guide.md)
- [Feature Gates](./feature-gates.md)
- [Coding conventions](./conventions.md)
- [Fuzzing](./fuzzing.md)
- [Notification groups](notification-groups/about.md)
- [ARM](notification-groups/arm.md)
- [Cleanup Crew](notification-groups/cleanup-crew.md)
Expand Down
105 changes: 105 additions & 0 deletions src/fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Fuzzing

<!-- date-check: Mar 2023 -->

For the purposes of this guide, *fuzzing* is any testing methodology that
involves compiling a wide variety of programs in an attempt to uncover bugs in
rustc. Fuzzing is often used to find internal compiler errors (ICEs). Fuzzing
can be beneficial, because it can find bugs before users run into them and
provide small, self-contained programs that make the bug easier to track down.
However, some common mistakes can reduce the helpfulness of fuzzing and end up
making contributors' lives harder. To maximize your positive impact on the Rust
project, please read this guide before reporting fuzzer-generated bugs!

## Guidelines

### In a nutshell

*Please do:*

- Ensure the bug is still present on the latest nightly rustc
- Include a reasonably minimal, standalone example along with any bug report
- Include all of the information requested in the bug report template
- Search for existing reports with the same message and query stack
- Format the test case with `rustfmt`, if it maintains the bug

*Please don't:*

- Report lots of bugs that use internal features, including but not limited to
`custom_mir`, `lang_items`, `no_std`, and `rustc_attrs`.
- Seed your fuzzer with inputs that are known to crash rustc (details below).

### Discussion

If you're not sure whether or not an ICE is a duplicate of one that's already
been reported, please go ahead and report it and link to issues you think might
be related. In general, ICEs on the same line but with different *query stacks*
are usually distinct bugs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a few examples, such as:

panicked at 'index out of bounds: the len is 1 but the index is 1', [...] ena-0.14.1/src/snapshot_vec.rs:199:10

There are other common ones that maybe @matthiaskrgr remembers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more common ICEs

      7     "error_reason": "thread 'rustc' panicked at 'Box<dyn Any>', /rustc/1203e0866e6c3659775efcb8aecad21dc13ef38b/compiler/rustc_errors/src/lib.rs:995:33",
     13     "error_reason": "thread 'rustc' panicked at 'Box<dyn Any>', /home/matthias/vcs/github/rust_debug_assertions/compiler/rustc_errors/src/lib.rs:1644:9",
     14     "error_reason": "thread 'rustc' panicked at 'forcing query with already existing `DepNode`",
     20     "error_reason": "error: internal compiler error: compiler/rustc_infer/src/infer/lexical_region_resolve/mod.rs:203:17: cannot relate region: ReErased",
     37     "error_reason": "error: internal compiler error: no errors encountered even though `delay_span_bug` issued",
     80     "error_reason": "thread 'rustc' panicked at 'assertion failed: `(left == right)`",
     91     "error_reason": "error: internal compiler error: unexpected panic",


## Building a corpus

When building a corpus, be sure to avoid collecting tests that are already
known to crash rustc. A fuzzer that is seeded with such tests is more likely to
generate bugs with the same root cause, wasting everyone's time. The simplest
way to avoid this is to loop over each file in the corpus, see if it causes an
ICE, and remove it if so.

To build a corpus, you may want to use:

- The rustc/rust-analyzer/clippy test suites (or even source code) --- though avoid
tests that are already known to cause failures, which often begin with comments
like `// failure-status: 101` or `// known-bug: #NNN`.
- The already-fixed ICEs in [Glacier][glacier] --- though avoid the unfixed
ones in `ices/`!

## Extra credit

Here are a few things you can do to help the Rust project after filing an ICE.

- Add the minimal test case to [Glacier][glacier]
- [Bisect][bisect] the bug to figure out when it was introduced
Copy link
Member

@compiler-errors compiler-errors Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this list is arbitrarily ordered, I would prefer if we reorder this in terms of value to the ICE solver / compiler person like me 😸 :

  1. Bisecting
    • This one is extremely important when it comes to triage. If an ICE is a stable-to-beta regression, it's a far better candidate for immediate fixing. Also, finding the PR that caused the ICE typically makes fixing the ICE significantly easier.
  2. Minimization and fixing unrelated problems feel like the same the same thing, though if you are trying to distinguish them here for some reason, feel free correct me :P
  3. Glacier (these are only super useful for ICEs we expect to be there for a long time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimization and fixing unrelated problems feel like the same the same thing, though if you are trying to distinguish them here for some reason, feel free correct me :P

My only thought here is that it's possible to shink the test case in terms of bytes, LoC, etc. without reducing the number of unrelated errors (indeed, while increasing them).

- Fix unrelated problems with the test case (things like syntax errors or
borrow-checking errors)
- Minimize the test case (see below)

[bisect]: https://github.com/rust-lang/cargo-bisect-rustc/blob/master/TUTORIAL.md

## Minimization
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Might not do this before merging, but I will try to get to it at some point!)


It can be helpful to *minimize* the fuzzer-generated input. When minimizing, be
careful to preserve the original error, and avoid introducing distracting
problems such as syntax, type-checking, or borrow-checking errors.

There are some tools that can help with minimization. If you're not sure how
to avoid introducing syntax, type-, and borrow-checking errors while using
these tools, post both the complete and minimized test cases. Generally,
*syntax-aware* tools give the best results in the least amount of time.
[`treereduce-rust`][treereduce] and [picireny][picireny] are syntax-aware.
`halfempty` is not, but is generally a high-quality tool.

[halfempty]: https://github.com/googleprojectzero/halfempty
[picireny]: https://github.com/renatahodovan/picireny
[treereduce]: https://github.com/langston-barrett/treereduce

## Effective fuzzing

When fuzzing rustc, you may want to avoid generating code, since this is mostly
done by LLVM. Try `--emit=mir` instead.

A variety of compiler flags can uncover different issues.

If you're fuzzing a compiler you built, you may want to build it with `-C
target-cpu=native` to squeeze out a few more executions per second.

## Existing projects

- [fuzz-rustc][fuzz-rustc] demonstrates how to fuzz rustc with libfuzzer
- [icemaker][icemaker] runs rustc and other tools on a large number of source
files with a variety of flags to catch ICEs
- [tree-splicer][tree-splicer] generates new source files by combining existing
ones while maintaining correct syntax

[glacier]: https://github.com/rust-lang/glacier
[fuzz-rustc]: https://github.com/dwrensha/fuzz-rustc
[icemaker]: https://github.com/matthiaskrgr/icemaker/
[tree-splicer]: https://github.com/langston-barrett/tree-splicer/