-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: make Cargo embed dependency versions in the compiled binary #2801
RFC: make Cargo embed dependency versions in the compiled binary #2801
Conversation
One idea is to wrap Cargo.lock into another trivial format: yaml, json, custom text with two keys: ContentType and Body and you can version your format if Cargo.lock becomes cumbersome (for some yet unknown reasons) and will have to be replaced with something else. |
I would request that this is not enabled by default. As noted by RustDragon on reddit, this could leak potentially sensitive information through the embedded paths. |
As described in the RFC, not having this enabled by default pretty much defeats the point. Would disclosing that the dependency originated from a local copy, but not disclosing the directory path be an acceptable solution? This would protect information such as usernames and paths. |
I'm not sure that not having it enabled by default would defeat the point. Having this information available to be enabled would still be immensely valuable to production users. My main concern is users accidentally having their information leaked just by handing out a binary. |
Having thought about this a bit more I think that this would be fine. It might be desirable to be able to turn path embedding back on (for full auditing), with it being disabled by default (for safety). |
Yeah this should definitely be off by default, even if the info is scrubbed of paths and stuff. |
@Lokathor could you provide a rationale? I think the RFC makes a pretty strong case for the on-by-default approach. Prior art from Go and (as I've just learned) Ruby supports this as well. |
It has also been pointed out that this may interact with reproducible builds in interesting ways - I'm not sure if Cargo.lock is guaranteed to have a stable sorting order. |
I don't necessarily think this should be off by default if the size increase is indeed as minimal as claimed: even in release mode some debug information is included already. I do think it should be removed when the binary is stripped though. The RFC claims a sub 1% increase in size: I would like to see some more evidence to back that up; the "hello world" example is not enough. The number of dependencies increases the lock file size linearly, but it does not increase the size of the binary linearly due to unused code being removed at link time, so the relative size could go up. Furthermore, platform-specific dependencies are tracked in the lock file but are not compiled into the binary at all on other platforms. Regarding reproducible builds: as long as the information in the Cargo.lock is encoded in a deterministic way, it shouldn't be a problem. If cargo picks a dependency version non-deterministically then your deterministic build is broken regardless of whether you incorporate the lock file directly. One thing not covered by this RFC is what crate types will embed this information. Should the "cdylib" type embed the information? On Windows, DLLs are just executables with a different extension so you could reasonably argue that the same version information should be included. |
Good point, will do.
I don't think
It's under "unresolved questions" since I don't have a strong stance on this, but I am leaning towards "yes" for the same reasons you've described. |
Bonus data point: A minimal hello world on Windows is ~3.5k, and the Cargo.lock for that is ~1.9k. The ratio generally gets better as your program grows. The amount of code that usage of a dependency puts into your binary is usually a lot more than the line or two it takes to list the dependency. Dependencies that hurt the ratio (adding a dependency entries but not adding code into the final binary) would likely be things like proc-macro crates, That said, I still don't want this in my programs because I just don't want extra stuff going in my files. |
Size comparisons on binaries of varying sizes, as measured on Linux with ThinLTO:
We can do a Crater run to be sure. |
It seems like we should aim to be reproducible even if the crates came from different sources. I.e. downloading something from a registry vs using a local path shouldn't change things. Maybe we could use a format this is just a sorted sequence of |
I fear omitting the registry where a certain dependency came from would mean losing too much information. This defeats use cases such as only allowing dependencies from a private registry, for example. |
You can't reproduce a build without |
For prior art and motivation completeness: Where I work at Bloomberg, our C++ codebase has enough control over such things that we rolled our own version embedding tricks many years ago. On top of the security benefits, this has been utterly invaluable for diagnosing many bugs in production, and ensuring that fixes have been fully released. The specific format it uses doesn't make much sense for an open source ecosystem in a language with a proper module/package system, but I thought it was worth mentioning this sort of thing has precedent even in C++. |
As suggested by @Shnatsel, I'll drop a mention of my Of course, this does not cover the use case of the RFC, but it is a related tool that has some overlap with the goals of the RFC. |
First of all: thank you for working on the problems of dependency tracking and security analysis! Please don't take any of the feedback below as suggesting that I disagree with the goals. I agree with the various other posts giving a rationale of not wanting to disclose excessive information, especially about local paths and registries. You need to scrub all of the potentially sensitive information, such as local registries, paths, and repository URLs. A quick check on my system turns up Cargo.lock files upwards of 100k-140k. I'm also concerned about exposing the potentially evolving Cargo.lock format (as mentioned in your unresolved questions section); that format has changed substantially over time. I think we could reduce that format substantially to serve this particular purpose. But at the very least, it seems worth compressing it before embedding it. A quick check suggests that basic zlib compression reduces the size of Cargo.lock files by 75-85%. You also need to consider the size of the delta from one version of a binary to another, as many distribution mechanisms want to distribute deltas. I can easily imagine the You should discuss the issue of format versioning. You could theoretically evolve this format by using a new section name if the version changes. Given that this uses a completely Rust-specific (and cargo-specific) format, the section name should not use something generic like
That's a problem that people are trying to fix. And that's with
(Typo: s/excempt/exempt/) People care about code size on other platforms, too, and the same target may get used for large applications and embedded development. You can't reliably classify every target into "embedded" and "not embedded". Also, this isn't just about code size; some use cases need to carefully construct binaries and care about any "unusual" sections that get generated in the binary. Some binaries will get built with linker scripts. You don't specify any mechanism to enable/disable this. You need to provide and specify appropriate configuration for that. For a first pass, I would propose addressing the issues above, and then proposing a version that's initially disabled by default. This will let people experiment with it, optimize it, audit the information it includes, and so on. We can always revisit questions of defaults and their interaction with various use cases in the future. |
Regarding the "leaking sensitive information" issue, beyond the obvious "strip local paths and private urls" mitigation, it's useful to think in terms of threat model. How does embedding version info affect the balance between attacker and defender ?
For most cases I think the version info is a net gain for the defender (and therefore it should eventually be on by default once it matures), but there'll always be cases when it isn't. It's not just about security: as a user I may want to recompile my binaries when a performance fix is out for example. |
When I worked on an infosec team at a large payments company, we built a system based on Ruby's It scanned filesystems for these files, audited them against a similar vulnerability database (RubySec), opened tickets for vulnerable apps, and auto-closed them when they were updated to non-vulnerable versions. It seems there's a lot of concerns about potential leakage of internal repositories. However for the above feature to work, we needed that information (which we also used to file internal vulnerabilities against internal packages). Far from that information being a concern, it was something we actually leveraged in an enterprise context. For enterprises deploying in-house applications, I don't think leaking this information is any concern whatsoever. For crates published on crates.io, it also isn't a concern as these crates, by definition, can't refer to internal repositories. So the real information leakage concern is: publicly published binaries (e.g. through a vendor's web site or an app store) leaking this information accidentally because the publishers did not opt out of removing it. To me this is a somewhat valid concern: it's information exposure and a potential loss in defense-in-depth. But I do just want to make clear that I think, with that acknowledged, this is also a feature that enterprise users will want for internal binaries, to allow internal auditing of deployed application binaries. |
We talked about this in a recent cargo team meeting and there seemed to be interest in first focusing on identifying the needs and seeing what part cargo should be playing in solving the larger SBOM situation. If there is a built-in solution, I assume the extent of the information will mean it would be large and not reproducible, so we'd like want to start with the separate artifact. From there, we can further explore the aims of embedding a subset of the SBOM in the binary. @Shnatsel would you be interested in pivoting this RFC in that direction and leading or taking part in that effort? |
I just got contracted to advance cargo-cyclonedx. So I am getting quite involved in the broader issue of SBOM generation, and I've done a bit of research on what is required for it to work well. There are some gaps in the information that I am planning to migrate |
This comment was marked as resolved.
This comment was marked as resolved.
For some of the people I talked to about SBOMs,
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
All of this can be captured with the machinery already used by Generic SBOM generation is a much bigger scope with a much bigger design space, so I am a bit puzzled how that would make a reasonable starting point. The scope of this RFC is intentionally very narrow, and the format is also designed so that we can easily extend it later if other compelling use cases emerge. This is done specifically to make the change as straightforward and uncontroversial as possible. |
@epage reading previous comments, it sounds like your concern regarding missing SBOM data is about false positive elimination, however this is a known and longstanding issue with e.g. It's something we've long been meaning to address, but also if we held off on shipping anything until it was solved we wouldn't have anything to show for it. Even if it were implemented immediately, it can't yet be consumed by At any rate, it's no worse than the current state of affairs with Cargo.lock auditing using |
Maybe put a different way, this solves one problem (identifying vulnerabilities in released binaries that lack a chain of custody) but, for the limited bandwidth of the cargo team, we are interested in putting our efforts in solving a different problem (SBOMs). They are related (cargo-audit has a subset of the needed dependency information). Independent of our general focus, there is the question of designing the manifest configuration for this and I suspect looking at SBOMs first has us looking at the bigger picture and will help us have more of the information we need for the manifest configuration. |
But if this feature used an SBOM format like CycloneDX, it could be an incremental first step towards full SBOM support that itself is independently useful for vulnerability auditing. Someone else just needs to, as an additive second step, fill in the missing gaps you've highlighted. |
This RFC is deliberately scoped so narrowly that we wouldn't need to design manifest configuration beyond an on/off switch. I still believe that SBOM generation is better solved outside Cargo. The SBOM formats are still evolving, as is the ecosystem around them. New use cases, requirements and regulations pop up all the time. This rapidly evolving landscape would conflict with Cargo's stability guarantees, and the limited reviewer bandwidth for a rapidly changing implementation would be a hindrance. I expect to produce proof of this belief in the next couple of months in the form of |
It is clear that the RFC needs major revisions. At least:
So I am converting this PR to draft to clearly indicate its status. |
Team member @epage has proposed to postpone this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to postpone, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This is now postponed. |
Note to future self: configuration not being sufficiently defined was brought up as a blocker for this RFC. rust-lang/cargo#14222 has documented the configuration considerations for Cargo. This document should be used as a starting point later. |
Rendered
This is the first step towards a proper security update story for Rust. You might also want to check out the comments on a prototype implementation that informed this RFC.
2023 update: this has been implemented out-of-tree as
cargo auditable
, which has garnered significant adoption. This RFC has been updated and expanded based on the lessons learned from it.