Create RFC for MIR linker #1600

celinval · 2022-08-26T23:58:16Z

Description of changes:

Create an RFC for the implementation of a MIR linker. The MIR linker would provide a performant solution to solve issues for missing std definitions.

I created two prototypes, one for the MIR Linker and one without the linker. They both compile the std library as part of cargo kani. Here are the links for the MIR Linker prototype and the Alternative prototype.

You can also find a rendered version of this document here.

Resolved issues:

Related to #1588

Call-outs:

Testing:

How is this change tested?
Is this a refactor change?

Checklist

Each commit message has a non-empty body, explaining why the change was made
Methods or procedures are documented
Regression or unit tests are included, or existing tests cover the modified code
My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Add PR link

giltho

That sounds great!
I added two tiny comments, one of them is just a confirmation that the MIR-linking approach seems the best. I don’t know if the information should be added to the RFC itself.

Regarding the code of the mir-linker itself, reachability.rs is typically something I’d like to have in a tools-lib crate. I noticed that the only dependency on Kani is the call to GotocCtx::is_proof_harness. Could ctx parameter of collect_reachable_items be changed to a select : Fn(&Instance) -> bool parameter instead and that could actually be shared with the rest of the Rust verification community.

Should someone internal to Rust and external to Kani be asked to quickly read this RFC and provide feedback?

giltho · 2022-08-30T15:44:35Z

rfcs/src/rfcs/0001-mir-linker.md

+   I.e., for every crate compiled, `kani-compiler` will generate an artifact that includes the `MIR` representation
+  of all items in the crate.
+2. Kani will generate the `goto-program` only while compiling the target user crate.
+  It will generate one `goto-program` model that includes all items reachable from every harness in the target crate.


It's a detail, but I think it should be "... that includes all items reachable from any harness in the target crate."
I.e, we need at least one harness to reach an item to include it, not every harness to reach that item

It's a very important detail. Thanks!

rfcs/src/rfcs/0001-mir-linker.md

giltho · 2022-08-30T16:55:16Z

rfcs/src/rfcs/0001-mir-linker.md

+### Cross-Crate Reachability Analysis
+
+`kani-compiler` will include a new `reachability` module to traverse over the local and external `MIR` items.
+This module will `monomorphize` all generic code as it's performing the traversal.


If we were to not monomorphize everything, what would be the implications? Does it make the process impossible or the only side-effect would be to potentially add include too many items?

That is a great question! Conceptually, I don't see any reason why the reachability code couldn't work with polymorphic definitions. If we decide to contribute this code to upstream or put in separate library, it would be a great thing to do.

For our current goal, it does make sense to monomorphize here since Kani's compiler just wants a list of items to generate code for.

Yes totally agree

Fix wording

celinval · 2022-08-30T18:00:19Z

That sounds great! I added two tiny comments, one of them is just a confirmation that the MIR-linking approach seems the best. I don’t know if the information should be added to the RFC itself.

Regarding the code of the mir-linker itself, reachability.rs (http://reachability.rs/) is typically something I’d like to have in a tools-lib crate. I noticed that the only dependency on Kani is the call to GotocCtx::is_proof_harness. Could ctx parameter of collect_reachable_items be changed to a select : Fn(&Instance) -> bool parameter instead and that could actually be shared with the rest of the Rust verification community.

Should someone internal to Rust and external to Kani be asked to quickly read this RFC and provide feedback?

Totally. I tried to keep it as independent from Kani's code as possible to keep this code clean and to allow us to pull this out of here. The is_proof_harness is something I am planning to change by making the starting points configurable.

One of my thoughts was to use this reachability module as my starting point to perform further stable MIR experiments. :)

BTW, I loved the idea of getting someone from the Rust team to read this RFC. Thank you!

giltho · 2022-08-30T18:51:01Z

Just noticed the https://reachability.rs strangeness, that happened because I wrote that comment in apple notes first and copy pasted in gh, sorry if that was confusing

giltho · 2022-08-30T18:52:16Z

One of my thoughts was to use this reachability module as my starting point to perform further stable MIR experiments. :)

Very excited for your future work 🤩

zhassan-aws

This is very exciting! Thanks @celinval. Just had a few comments/questions.

rfcs/src/rfcs/0001-mir-linker.md

zhassan-aws · 2022-08-30T23:14:28Z

rfcs/src/rfcs/0001-mir-linker.md

+
+### Kani's Sysroot
+
+Kani currently uses `rustup` sysroot to gather information from the standard library constructs.


I'm a bit unclear on this section. What standard library constructs are looked up in sysroot? Does sysroot have MIR lying around?

Yes. The rustup sysroot does have MIR lying around. They are inside the *rlib files shipped with the toolchain (see more details here: https://rustc-dev-guide.rust-lang.org/backend/libs-and-metadata.html#metadata).

@zhassan-aws what kind of information would you like me to add here to make this more clear?

I think it's fine as is.

zhassan-aws · 2022-08-30T23:25:30Z

rfcs/src/rfcs/0001-mir-linker.md

+  cargo rustc <FLAGS> -- --reachability=harnesses
+  ```
+
+  to build everything.


As far as I understand, we've been primarily using the KANIFLAGS environment variable to pass things to the kani-compiler? If this is correct, any reason to switch to using cargo rustc?

The main reason is that we want to pass an extra flag to the target crate which we wouldn't pass to the dependencies.

Unfortunately, this will still keep KANIFLAGS working as is today.

zhassan-aws · 2022-08-30T23:31:39Z

rfcs/src/rfcs/0001-mir-linker.md

+| `goto-program` generation |   2.4s     |    90.7s    |
+| `goto-program` linking    |   0.8s     |    33.2s    |
+| code instrumentation      |   0.8s     |    33.1     |
+| verification              |   0.5s     |    8.5s     |


The results are amazing! I'd expect memory usage to be significantly lower with the MIR linker as well.

If you still have the log files, can you compare the size of the SAT instance (clauses/literals) between the two alternatives? I'm curious to know how much CBMC's slicer is able to reduce the problem size, and how close it can get to the one generated by the MIR linker.

Yeah, I totally forgot to include the memory usage here.

Both prototypes run the test described in CI. So you can see the log files for the CI run for MIR linker and the alternative.

The specific data I believe you are asking for is:

14089 variables, 180 clauses

and

21610 variables, 7514 clauses

for MIR Linker and the alternative, respectively.

It just occurred to me that I should try to check the difference under visualize. It might make it more clear to understand the difference.

rfcs/src/rfcs/0001-mir-linker.md

oli-obk · 2022-08-31T06:53:12Z

rfcs/src/rfcs/0001-mir-linker.md

+`kani-compiler` will include a new `reachability` module to traverse over the local and external `MIR` items.
+This module will `monomorphize` all generic code as it's performing the traversal.


rustc already has logic for this in https://github.com/rust-lang/rust/blob/master/compiler/rustc_monomorphize/src/collector.rs

Maybe it could be generalized to allow you to plug into it and make it work cross crate (rustc only needs cross crate for generic functions and will assume nongeneric functions got codegened upstream)

We have for now given up on mir-only rlibs, but if we want to pick up that work, we'd need exactly what you are describing anyway, so it doesn't seem unreasonable to change the collector to support it optionally.

That would be great! Most of the code I've written so far was based on the collector logic.

Co-authored-by: Zyad Hassan <88045115+zhassan-aws@users.noreply.github.com>

danielsn · 2022-09-01T18:58:23Z

rfcs/src/rfcs/0001-mir-linker.md

+  GitHub workers.
+- In a potential assessment mode, only analyze code that is reachable from all public items in the target crate.
+
+One downside is that our release bundle will be bigger.


How much larger?

Approx 26MB bigger for linux 64bits. Kani 0.9 bundle size was 22MB. So this will more than double the size. That said, we also download the entire rustup toolchain as part of kani setup and I believe it has more than 50MB.

Changed the text a bit. Let me know if it's clearer now.

danielsn · 2022-09-01T18:58:43Z

rfcs/src/rfcs/0001-mir-linker.md

+
+In a nutshell, we will no longer generate a `goto-program` model for every crate we compile.
+Instead, we will generate the `MIR` for every crate, and we will generate only one `goto-program` model.
+This model will only include items reachable from the target crate's harnesses.


Should we consider one per harness to get even more slicing?

Totally. I put that as a future work for now. Let me know if you think this should be in the scope of the initial work.

@danielsn is it OK if I leave it like this for the first version of the MIR Linker?

danielsn · 2022-09-01T19:00:04Z

rfcs/src/rfcs/0001-mir-linker.md

+Failures in the linking stage would not impact the tool soundness. I anticipate the following failure scenarios:
+- ICE (Internal compiler error): Some logic is incorrectly implemented and the linking stage crashes.
+  Although this is a bad experience for the user, this will not impact the verification result.
+- Missing items: This would either result in ICE during code generation or a verification failure if the missing


What about if its used as a potential function pointer?

That's a great point. CBMC should catch that once the following PR gets merged: diffblue/cbmc#6987

Another possibility is to change how we invoke --goto-instrument as pointed out here: diffblue/cbmc#6983 (comment)

I created #1621 to address that issue and added to https://github.com/model-checking/kani/milestone/12

danielsn · 2022-09-01T19:02:17Z

rfcs/src/rfcs/0001-mir-linker.md

+
+## Open questions
+
+- Should we build or download the sysroot during `kani setup`?


How do we handle cases during development when we modify something the sysroot depends on

During development, we shall build the sysroot together with all our binaries. I haven't figured out the best mechanism yet though. Building the sysroot as part of the kani-compiler build isn't quite working for me. The sysroot will require different cargo configuration, so I am considering splitting our workspace and creating a new workspace for the library folder.

@danielsn Would you like me to add this information here? I tried to focus on the end user point of view, but it might make sense to clarify this.

danielsn · 2022-09-01T19:08:50Z

Is it possible to have more than one sysroot?

celinval · 2022-09-01T19:55:07Z

Is it possible to have more than one sysroot?

Yes. The kani-compiler already includes a --sysroot option to configure which sysroot to use. So it would be a matter of building them separately and selecting the correct one.

rfcs/src/rfcs/0001-mir-linker.md

adpaco-aws · 2022-09-01T20:21:48Z

rfcs/src/rfcs/0001-mir-linker.md

+ - `none`: This will be the default value if `--reachability` flag is not provided. It will basically skip
+ reachability analysis (and `goto-program` generation).


Well, goto-programs will still be generated, no?

No. With none we won't generate any goto-program. This will be used while compiling the dependencies.

@tedinski and @adpaco-aws I added a bit more information here. Let me know if that makes sense.

rfcs/src/rfcs/0001-mir-linker.md

tedinski · 2022-09-01T20:14:42Z

rfcs/src/rfcs/0001-mir-linker.md

+thus, we are unable to translate these items into `goto-program`.
+
+In order to fulfill this gap, we must compile the standard library from scratch.
+This RFC proposes a similar method to what [`MIRI`](https://github.com/rust-lang/miri) implements.


Can... we just use what MIRI ships?

We install a rust toolchain on install. I'd been thinking about whether we could reduce this to a minimal profile as an optimization, but we should also be able to go the other way and add the miri component?

That's a great point. I'm assuming the overhead will be the same, but it might save us from maintaining yet another build script.

Let me investigate that.

It might be awkward to use MIRI to grab the sysroot. MIRI only build the sysroot when you first execute cargo miri [run | test]. It also requires xargo in order to do that.

That said, it might not be that bad to build the sysroot instead of downloading it.

@tedinski please let me know your thoughts on this one.

Shucks, I was hoping MIRI would ship it. I'm ok with adding it as part of setup or pre-building it. Though I'm guessing MIRI doesn't pre-build it for a good reason?

@oli-obk what's the motivation to build MIRI sysroot as part of the first execution of MIRI opposed to pre-building it?

miri can cross interpret, so we'd need to deploy all targets somewhere once (instead of once per host). Also sometimes you'll want to change how the sysroot is built I guess.

Also we need the ability to build the sysroot from scratch anyway for when we modify miri out of tree.

tedinski · 2022-09-01T20:16:53Z

rfcs/src/rfcs/0001-mir-linker.md

+
+The traversal logic will be customizable allowing different starting points to be used.
+The two options to be included in this RFC is starting from all local harnesses
+(tagged with `#[kani::proof]`) and all public functions in the local crate.


I assume that under --test tests are marked public and that keep them here?

I also wonder if we should just... include both by default and not have options here? This avoids the complication of setting/flags/configuration, doesn't seem like a major downside with slightly more code compiled, and makes everything more flexible.

E.g. a few minutes of thought comes up with the case of Rust calling C code and that C calling Rust code that's "extern C" and which we can't inspect the reachability of, but including both harnesses and public items both would handle seamlessly.

There is two parts to this:
1- I believe the reachability code should be kept generic to take whatever entry point we want.
2- For now, we will only provide two possible choices, harnesses and public functions.

I do see other future usages for the starting point. C + Rust verification will likely be a combination of harness + extern C functions.

For --tests, I believe we still want to use the harness mode unless if the user chooses --function. For the later, I might just keep the legacy mode for now.

@tedinski does that make sense? Let me know if you want me to change anything or make the text more clear about the customization points.

I'm fine with whichever course you choose, but my preference is that we should error on the side of simplicity here. IMO it should just use "public + harnesses" as roots and then we don't need to worry about other options. It seems like the worse case scenario is just codegen'ing a bit more than we need, while the advantage is that everything just works and nobody has to think about this again.

rfcs/src/rfcs/0001-mir-linker.md

tedinski · 2022-09-01T20:19:21Z

rfcs/src/rfcs/0001-mir-linker.md

+  with:
+
+  ```
+  cargo rustc <FLAGS> -- --reachability=harnesses


What about when we run cargo kani --tests

Great question. At some point I thought about adding this to the "Open Questions", but I totally forgot. I'll do that now.

I don't have a great answer yet. The two possibilities I thought so far are on the hacky side, but I believe they would work:

Add a new value to --reachability, maybe "on-tests" or something like that. This would behave as either "harness" or "none" depending whether RUSTFLAGS has --tests.

Change how we handle not having --reachability argument to also take into account the existence of --tests.

Both would ensure that we only run the reachability and codegen at the last compilation step. Any other suggestions?

@tedinski any suggestions? I added this to the open questions too.

My question was less on the reachability side here, and more on the side of the actual pragmatics of doing the build.

cargo build becomes cargo rustc but what does cargo test --no-run become?

It will likely stay as is and we might add a --reachability=on-tests or something like that.

@tedinski I have great news, we can replace

cargo test --no-run

by

cargo rustc --tests -- --reachability=harnesses

I'll add some details here and remove that from the open questions.

rfcs/src/rfcs/0001-mir-linker.md

rfc/src/SUMMARY.md

adpaco-aws · 2022-09-06T14:04:40Z

rfc/src/rfcs/0001-mir-linker.md

+   This will generate a goto-program for each crate compiled by `kani-compiler`, and it will still have the same
+   `std` linking issues.
+
+*This option will be removed as part of the `rfc` stabilization.


Since * doesn't link or anything, I suggest removing it in:

--reachability=[ harnesses | pub_fns | none | legacy* ]

and include this text in the item above directly. If you like, you can highlight as "Note: This option will be removed as part of the rfc stabilization."

Move legacy comment

Create draft RFC for MIR linker

d4cc9f0

celinval requested a review from a team as a code owner August 26, 2022 23:58

celinval mentioned this pull request Aug 29, 2022

Build custom sysroot #1605

Closed

celinval changed the title ~~Create draft RFC for MIR linker~~ Create RFC for MIR linker Aug 30, 2022

Update 0001-mir-linker.md

1c56998

Add PR link

giltho reviewed Aug 30, 2022

View reviewed changes

Update 0001-mir-linker.md

5091258

Fix wording

rahulku added this to the Fix linking issues against std library milestone Aug 30, 2022

zhassan-aws reviewed Aug 30, 2022

View reviewed changes

oli-obk reviewed Aug 31, 2022

View reviewed changes

Update rfcs/src/rfcs/0001-mir-linker.md

9c5d0f6

Co-authored-by: Zyad Hassan <88045115+zhassan-aws@users.noreply.github.com>

danielsn reviewed Sep 1, 2022

View reviewed changes

adpaco-aws reviewed Sep 1, 2022

View reviewed changes

tedinski reviewed Sep 1, 2022

View reviewed changes

celinval added 3 commits September 2, 2022 11:11

Merge remote-tracking branch 'origin/main' into issue-1588-rfc

09a8dc4

Move folder rfcs -> rfc per model-checking#1543

ceacdfb

Address PR comments + add link to the RFC

17bae26

tedinski approved these changes Sep 2, 2022

View reviewed changes

adpaco-aws approved these changes Sep 6, 2022

View reviewed changes

celinval added 6 commits September 6, 2022 09:30

Update 0001-mir-linker.md

ebe3b65

Move legacy comment

Merge branch 'main' into issue-1588-rfc

05a42e4

Merge branch 'main' into issue-1588-rfc

b6a2cd0

Merge branch 'main' into issue-1588-rfc

b3658e2

Merge branch 'main' into issue-1588-rfc

4c6d237

Merge branch 'main' into issue-1588-rfc

ffd8ec6

celinval merged commit 191a3f7 into model-checking:main Sep 7, 2022

celinval mentioned this pull request Sep 8, 2022

Add --mir-linker flag and wire it up to the compiler #1652

Merged

4 tasks

celinval added the T-RFC Label RFC PRs and Issues label Nov 7, 2022


		### Kani's Sysroot

		Kani currently uses `rustup` sysroot to gather information from the standard library constructs.

		`kani-compiler` will include a new `reachability` module to traverse over the local and external `MIR` items.
		This module will `monomorphize` all generic code as it's performing the traversal.


		## Open questions

		- Should we build or download the sysroot during `kani setup`?

		- `none`: This will be the default value if `--reachability` flag is not provided. It will basically skip
		reachability analysis (and `goto-program` generation).

Create RFC for MIR linker #1600

Create RFC for MIR linker #1600

Conversation

celinval commented Aug 26, 2022

Description of changes:

Resolved issues:

Call-outs:

Testing:

Checklist

giltho left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

celinval commented Aug 30, 2022

giltho commented Aug 30, 2022

giltho commented Aug 30, 2022

zhassan-aws left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielsn commented Sep 1, 2022

celinval commented Sep 1, 2022

Choose a reason for hiding this comment

celinval Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

celinval Sep 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giltho left a comment •

edited

Loading

celinval Sep 1, 2022 •

edited

Loading

celinval Sep 2, 2022 •

edited

Loading