-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create RFC for MIR linker #1600
Conversation
Add PR link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great!
I added two tiny comments, one of them is just a confirmation that the MIR-linking approach seems the best. I don’t know if the information should be added to the RFC itself.
Regarding the code of the mir-linker itself, reachability.rs
is typically something I’d like to have in a tools-lib crate. I noticed that the only dependency on Kani is the call to GotocCtx::is_proof_harness. Could ctx parameter of collect_reachable_items be changed to a select : Fn(&Instance) -> bool parameter instead and that could actually be shared with the rest of the Rust verification community.
Should someone internal to Rust and external to Kani be asked to quickly read this RFC and provide feedback?
rfcs/src/rfcs/0001-mir-linker.md
Outdated
I.e., for every crate compiled, `kani-compiler` will generate an artifact that includes the `MIR` representation | ||
of all items in the crate. | ||
2. Kani will generate the `goto-program` only while compiling the target user crate. | ||
It will generate one `goto-program` model that includes all items reachable from every harness in the target crate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a detail, but I think it should be "... that includes all items reachable from any harness in the target crate."
I.e, we need at least one harness to reach an item to include it, not every harness to reach that item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very important detail. Thanks!
rfcs/src/rfcs/0001-mir-linker.md
Outdated
### Cross-Crate Reachability Analysis | ||
|
||
`kani-compiler` will include a new `reachability` module to traverse over the local and external `MIR` items. | ||
This module will `monomorphize` all generic code as it's performing the traversal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to not monomorphize everything, what would be the implications? Does it make the process impossible or the only side-effect would be to potentially add include too many items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great question! Conceptually, I don't see any reason why the reachability code couldn't work with polymorphic definitions. If we decide to contribute this code to upstream or put in separate library, it would be a great thing to do.
For our current goal, it does make sense to monomorphize here since Kani's compiler just wants a list of items to generate code for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes totally agree
Fix wording
Totally. I tried to keep it as independent from Kani's code as possible to keep this code clean and to allow us to pull this out of here. The One of my thoughts was to use this reachability module as my starting point to perform further stable MIR experiments. :) BTW, I loved the idea of getting someone from the Rust team to read this RFC. Thank you! |
Just noticed the |
Very excited for your future work 🤩 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exciting! Thanks @celinval. Just had a few comments/questions.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
|
||
### Kani's Sysroot | ||
|
||
Kani currently uses `rustup` sysroot to gather information from the standard library constructs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unclear on this section. What standard library constructs are looked up in sysroot
? Does sysroot
have MIR lying around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The rustup sysroot does have MIR lying around. They are inside the *rlib files shipped with the toolchain (see more details here: https://rustc-dev-guide.rust-lang.org/backend/libs-and-metadata.html#metadata).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhassan-aws what kind of information would you like me to add here to make this more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine as is.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
cargo rustc <FLAGS> -- --reachability=harnesses | ||
``` | ||
|
||
to build everything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, we've been primarily using the KANIFLAGS
environment variable to pass things to the kani-compiler
? If this is correct, any reason to switch to using cargo rustc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is that we want to pass an extra flag to the target crate which we wouldn't pass to the dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this will still keep KANIFLAGS working as is today.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
| `goto-program` generation | 2.4s | 90.7s | | ||
| `goto-program` linking | 0.8s | 33.2s | | ||
| code instrumentation | 0.8s | 33.1 | | ||
| verification | 0.5s | 8.5s | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results are amazing! I'd expect memory usage to be significantly lower with the MIR linker as well.
If you still have the log files, can you compare the size of the SAT instance (clauses/literals) between the two alternatives? I'm curious to know how much CBMC's slicer is able to reduce the problem size, and how close it can get to the one generated by the MIR linker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I totally forgot to include the memory usage here.
Both prototypes run the test described in CI. So you can see the log files for the CI run for MIR linker and the alternative.
The specific data I believe you are asking for is:
14089 variables, 180 clauses
and
21610 variables, 7514 clauses
for MIR Linker and the alternative, respectively.
It just occurred to me that I should try to check the difference under visualize. It might make it more clear to understand the difference.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
`kani-compiler` will include a new `reachability` module to traverse over the local and external `MIR` items. | ||
This module will `monomorphize` all generic code as it's performing the traversal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rustc already has logic for this in https://github.com/rust-lang/rust/blob/master/compiler/rustc_monomorphize/src/collector.rs
Maybe it could be generalized to allow you to plug into it and make it work cross crate (rustc only needs cross crate for generic functions and will assume nongeneric functions got codegened upstream)
We have for now given up on mir-only rlibs, but if we want to pick up that work, we'd need exactly what you are describing anyway, so it doesn't seem unreasonable to change the collector to support it optionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great! Most of the code I've written so far was based on the collector logic.
Co-authored-by: Zyad Hassan <88045115+zhassan-aws@users.noreply.github.com>
rfcs/src/rfcs/0001-mir-linker.md
Outdated
GitHub workers. | ||
- In a potential assessment mode, only analyze code that is reachable from all public items in the target crate. | ||
|
||
One downside is that our release bundle will be bigger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much larger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approx 26MB bigger for linux 64bits. Kani 0.9 bundle size was 22MB. So this will more than double the size. That said, we also download the entire rustup toolchain as part of kani setup
and I believe it has more than 50MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the text a bit. Let me know if it's clearer now.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
|
||
In a nutshell, we will no longer generate a `goto-program` model for every crate we compile. | ||
Instead, we will generate the `MIR` for every crate, and we will generate only one `goto-program` model. | ||
This model will only include items reachable from the target crate's harnesses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider one per harness to get even more slicing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally. I put that as a future work for now. Let me know if you think this should be in the scope of the initial work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielsn is it OK if I leave it like this for the first version of the MIR Linker?
rfcs/src/rfcs/0001-mir-linker.md
Outdated
Failures in the linking stage would not impact the tool soundness. I anticipate the following failure scenarios: | ||
- ICE (Internal compiler error): Some logic is incorrectly implemented and the linking stage crashes. | ||
Although this is a bad experience for the user, this will not impact the verification result. | ||
- Missing items: This would either result in ICE during code generation or a verification failure if the missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if its used as a potential function pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point. CBMC should catch that once the following PR gets merged: diffblue/cbmc#6987
Another possibility is to change how we invoke --goto-instrument
as pointed out here: diffblue/cbmc#6983 (comment)
I created #1621 to address that issue and added to https://github.com/model-checking/kani/milestone/12
rfcs/src/rfcs/0001-mir-linker.md
Outdated
|
||
## Open questions | ||
|
||
- Should we build or download the sysroot during `kani setup`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we handle cases during development when we modify something the sysroot depends on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During development, we shall build the sysroot together with all our binaries. I haven't figured out the best mechanism yet though. Building the sysroot as part of the kani-compiler
build isn't quite working for me. The sysroot will require different cargo configuration, so I am considering splitting our workspace and creating a new workspace for the library
folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielsn Would you like me to add this information here? I tried to focus on the end user point of view, but it might make sense to clarify this.
Is it possible to have more than one sysroot? |
Yes. The |
rfcs/src/rfcs/0001-mir-linker.md
Outdated
- `none`: This will be the default value if `--reachability` flag is not provided. It will basically skip | ||
reachability analysis (and `goto-program` generation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, goto-programs will still be generated, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. With none
we won't generate any goto-program. This will be used while compiling the dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedinski and @adpaco-aws I added a bit more information here. Let me know if that makes sense.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
thus, we are unable to translate these items into `goto-program`. | ||
|
||
In order to fulfill this gap, we must compile the standard library from scratch. | ||
This RFC proposes a similar method to what [`MIRI`](https://github.com/rust-lang/miri) implements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can... we just use what MIRI ships?
We install a rust toolchain on install. I'd been thinking about whether we could reduce this to a minimal profile as an optimization, but we should also be able to go the other way and add the miri component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point. I'm assuming the overhead will be the same, but it might save us from maintaining yet another build script.
Let me investigate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be awkward to use MIRI to grab the sysroot. MIRI only build the sysroot when you first execute cargo miri [run | test]
. It also requires xargo
in order to do that.
That said, it might not be that bad to build the sysroot instead of downloading it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedinski please let me know your thoughts on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shucks, I was hoping MIRI would ship it. I'm ok with adding it as part of setup
or pre-building it. Though I'm guessing MIRI doesn't pre-build it for a good reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oli-obk what's the motivation to build MIRI sysroot as part of the first execution of MIRI opposed to pre-building it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miri can cross interpret, so we'd need to deploy all targets somewhere once (instead of once per host). Also sometimes you'll want to change how the sysroot is built I guess.
Also we need the ability to build the sysroot from scratch anyway for when we modify miri out of tree.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
|
||
The traversal logic will be customizable allowing different starting points to be used. | ||
The two options to be included in this RFC is starting from all local harnesses | ||
(tagged with `#[kani::proof]`) and all public functions in the local crate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that under --test
tests are marked public and that keep them here?
I also wonder if we should just... include both by default and not have options here? This avoids the complication of setting/flags/configuration, doesn't seem like a major downside with slightly more code compiled, and makes everything more flexible.
E.g. a few minutes of thought comes up with the case of Rust calling C code and that C calling Rust code that's "extern C" and which we can't inspect the reachability of, but including both harnesses and public items both would handle seamlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is two parts to this:
1- I believe the reachability code should be kept generic to take whatever entry point we want.
2- For now, we will only provide two possible choices, harnesses and public functions.
I do see other future usages for the starting point. C + Rust verification will likely be a combination of harness + extern C functions.
For --tests
, I believe we still want to use the harness mode unless if the user chooses --function
. For the later, I might just keep the legacy mode for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedinski does that make sense? Let me know if you want me to change anything or make the text more clear about the customization points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with whichever course you choose, but my preference is that we should error on the side of simplicity here. IMO it should just use "public + harnesses" as roots and then we don't need to worry about other options. It seems like the worse case scenario is just codegen'ing a bit more than we need, while the advantage is that everything just works and nobody has to think about this again.
rfcs/src/rfcs/0001-mir-linker.md
Outdated
with: | ||
|
||
``` | ||
cargo rustc <FLAGS> -- --reachability=harnesses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about when we run cargo kani --tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. At some point I thought about adding this to the "Open Questions", but I totally forgot. I'll do that now.
I don't have a great answer yet. The two possibilities I thought so far are on the hacky side, but I believe they would work:
- Add a new value to
--reachability
, maybe "on-tests" or something like that. This would behave as either "harness" or "none" depending whether RUSTFLAGS has--tests
. - Change how we handle not having
--reachability
argument to also take into account the existence of--tests
.
Both would ensure that we only run the reachability and codegen at the last compilation step. Any other suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedinski any suggestions? I added this to the open questions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was less on the reachability side here, and more on the side of the actual pragmatics of doing the build.
cargo build
becomes cargo rustc
but what does cargo test --no-run
become?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will likely stay as is and we might add a --reachability=on-tests
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedinski I have great news, we can replace
cargo test --no-run
by
cargo rustc --tests -- --reachability=harnesses
I'll add some details here and remove that from the open questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
rfc/src/rfcs/0001-mir-linker.md
Outdated
This will generate a goto-program for each crate compiled by `kani-compiler`, and it will still have the same | ||
`std` linking issues. | ||
|
||
*This option will be removed as part of the `rfc` stabilization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since *
doesn't link or anything, I suggest removing it in:
--reachability=[ harnesses | pub_fns | none | legacy* ]
and include this text in the item above directly. If you like, you can highlight as "Note: This option will be removed as part of the rfc
stabilization."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Description of changes:
Create an RFC for the implementation of a MIR linker. The MIR linker would provide a performant solution to solve issues for missing
std
definitions.I created two prototypes, one for the MIR Linker and one without the linker. They both compile the
std
library as part ofcargo kani
. Here are the links for the MIR Linker prototype and the Alternative prototype.You can also find a rendered version of this document here.
Resolved issues:
Related to #1588
Call-outs:
Testing:
How is this change tested?
Is this a refactor change?
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.