Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate .cargo_vcs_info.json and include in cargo package #5786

Closed

Conversation

dekellum
Copy link
Contributor

@dekellum dekellum commented Jul 24, 2018

Implements #5629

Edit 1, implementation notes:

I elected to generate the JSON with link in filesystem as target/.cargo_vcs_info.json (compare to target/.rustc_info.json). This makes it usable as a sentinel file. It also makes windows-mandory vs unix-advisory FileLocks a relevent distinction, which required some trial and error on appveyor CI here, now all sorted.

I found one test case (cargo test package::package_verbose) with different output on windows, to quote b553d72:

The package_verbose test has different output on windows, and from that output I can only conclude that on windows, and windows only, cargo is failing to find the parent repo. Note that difference. As the new .cargo_vcs_info.json of PR #5786 is best effort and informative only, this independent problem on windows seems acceptable [for] the feature.

Let me know if you would also like a new issue for that test difference. Cargo windows bug?

Edit 2, more implementation notes:

Note that I needed to adjust the handling of listing in ops/cargo_package vs checks such as what is now check_repo_state (previously named check_not_dirty). This is so that .cargo_vcs_info.json, if it can be produced, is included in the listing. (It would seem poor to omit there.) A side effect of this, is that a user might need to use the --allow-dirty flag to get a listing if the repo is dirty. That doesn't seem onerous to me. One test case reflects that change.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 25, 2018

The windows CI failures are real. I'll fix that with additional commits here.

@dekellum dekellum changed the title Generate .cargo_vcs_info.json and include in cargo package [WiP] Generate .cargo_vcs_info.json and include in cargo package Jul 25, 2018
@dekellum dekellum force-pushed the record-package-git-head-sha1 branch from f5e6562 to 7eb7720 Compare July 27, 2018 09:51
The package_verbose test has different output on windows, and from
that output I can only conclude that on windows, and windows only,
cargo is failing to find the parent repo. Note that difference. As the
new .cargo_vcs_info.json of PR rust-lang#5786 is a best effort, informative
only, this independent problem on windows seems acceptable to the
feature.
@dekellum dekellum force-pushed the record-package-git-head-sha1 branch from 7eb7720 to b553d72 Compare July 27, 2018 09:56
@dekellum dekellum changed the title [WiP] Generate .cargo_vcs_info.json and include in cargo package Generate .cargo_vcs_info.json and include in cargo package Jul 27, 2018
@dekellum
Copy link
Contributor Author

Ping @alexcrichton, this is ready for review.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for this! Could a test also be added to ensure that this file exists and has an expected structure?

Also cc @rust-lang/cargo, do any of y'all have feelings about this? This is a pretty minor feature but seems like it could be useful in the future! We may also want to have a bit more discussion about exactly what goes into this file

And finally, could documentation be added for this as well? Just to document the json format and such.

// Manual JSON format is safe given sha1 hash is hex encoded.
writeln!(ifile, r#"{{"git_head_revision_sha1":"{}"}}"#, hsh)?;
ifile.flush()?;
ifile.file().sync_all()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm could you elaborate a bit on why the file locks and syncs are needed here? I was thinking we'd just synthesize this file directly into the tarball (like Cargo.toml.orig without actually putting it on the filesystem)

Copy link
Contributor Author

@dekellum dekellum Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured early on that I wanted a sentry file to try and make it at least inconvenient to spoof the hash, for example by protecting against interference from another cargo package process. The changes would be reduced without the sentry file however. Do you think the anti-spoof efforts don't justify the increased complexity? Note: The test failure by platform diff will remain an issue even without the linked sentry file.

Another early thought was that cargo package was doing something sentry file like, with the tarball produced being valided before being moved in place, so this was for parity.

);
}

// FIXME: From this required change in the test (omits final
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this seems like it's a bit worrisome, do we know what changed from before to cause this test to fail on windows?

Copy link
Contributor Author

@dekellum dekellum Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't know that anything changed. the prior check_not_dirty would not bail! (which is avoided for the tests on both platforms) if either no-git-repo-found or git-repo-clean. So if there has been a change with this PR, for windows, then its a change between one of these results to the other. There is no existing test to pin down which one it is.

Also I've changed the order of checks (in cargo_package) vs list output, so another side-effect could be an earlier failure in a check, but that is not apparent. With the PR changes, cargo package just won't generate a .cargo_vcs_info.json file in the repo-not-found or git-repo-clean cases, which makes the output, coincidentally, the same as before.

So I'm asking to move forward with merging this, then followup with a specific test for this which can be backported if necessary for personal diagnostic purposes. Thoughts?

Copy link
Contributor Author

@dekellum dekellum Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: I have a logic flaw in the code comment, clarified above, which I'll fix. When I manually reviewed test fixtures, it looked like the sub-repo was dirty, so that makes me think this the no-git-repo-found case.

@joshtriplett
Copy link
Member

I'm thrilled to have this information available.

I do think we should generate this directly into the tarball, if we can.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 28, 2018

We certainly can, per my understanding of the Tar interface, but the somewhat higher bar, ¿neo-? ¿psuedo-? rhetorical question for you both (@joshtriplett @alexcrichton)—what are the odds someone will later file an issue, and subsequent PR to fix the direct impl. with a locking sentry file? Example titles:

  • Issue: Cargo VCS Info File is vulnerable to spoofing/forgery attacks
  • Pull: Restore sentry file target/.cargo_vcs_info.json for security and parity with tarball safety

As I personally think the odds of the above items are quite high, by objective review of past issues/PRs, I respectfully request reconsideration of this PR in its original form.

While you consider that, I will be attempting a reliable test of the no-git-repo-found or git-repo-clean conditions on Windows. :-)

@joshtriplett
Copy link
Member

It's a client-side mechanism, it's always going to be possible to fake it; let's not pretend it's a security mechanism. It's helpful to inform people, and if the build is reproducible then it can be cross-checked by independent builds.

Some basic guard to avoid unintentional concurrent builds in the same target directory seems like a feature, but that's orthogonal to this.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 28, 2018

I accept that the sentry aspect is security in depth at best. I thought I was careful not to suggest greater sanctimony, see above:

make it at least inconvenient to spoof the hash,

...and its not entirely orthogonal. That would require significant independent new feature development. As in stands in master, the tarball acts like a guard but check_not_dirty doesn't somehow reserve that file before the git-repo-clean check. Please accept my attempts to preserve the sentry aspect in this PR, despite it being of limited value.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 28, 2018

Prior to this PR and with #5820, the package_verbose tests now fails with the warning on windows:

https://ci.appveyor.com/project/rust-lang-libs/cargo/build/1.0.5293/job/ug1evef71f6d6pav#L1656

While the test succeeds without warning on not(windows):

https://travis-ci.org/rust-lang/cargo/builds/409298348?utm_source=github_status&utm_medium=notification

So the no-git-repo-found case is causing the disparity in the package_verbose test, as I feared. Suggested plan of action:

1. I'll cherry-pick and adjust that warning here, and add the warning output to the adjusted package_verbose test, windows variant. (edit: now a potential merge rather than cherry-pick)
2. I'll open a new issue, working title: "no-git-repo-found on windows-only, for package_verbose test" (opened: Issue: #5823 PR: #5820)
3. I'll add an explicit test of the vcs info file content
4. You can merge this, without any remaining concerns on regressions.

@alexcrichton : reasonable plan?

@joshtriplett
Copy link
Member

@dekellum You were clear in your original post; I was responding to your comment at #5786 (comment) .

I don't have any objection to attempting to detect concurrent builds; my only objection is that doing that only in the specific case of building in a git repository, rather than in every case, seems wrong to me.

@joshtriplett
Copy link
Member

joshtriplett commented Jul 28, 2018

On a different note: we don't currently have a check for packages already containing a file Cargo.toml.orig; as far as I can tell, we don't have any precedent in Cargo for "you can't include a file named like this". It seems a bit unfortunate to add such a precedent.

I understand the rationale for doing so; it just seems unfortunate.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 28, 2018

@joshtriplett, regarding comment I would say the strategy is instead: write the .cargo_vcs_info.json under a lock, ensuring the same file makes it into the tarball. It just so happens that that file only contains metadata from git currently. In the future (see discussion in #5629) it may be extended to other (D)VCS like hg, or with other cargo assertions/metadata. With this reasoning there is no precedent set as this being a git-specific feature.

On the latter point: It seems completely reasonable to me for any package manager, cargo included, to reserve certain file names for which it is to control the providence, format and meaning. Related, Cargo.toml and Cargo.lock are parsed and reformatted. For consistency, should I add the same check for Cargo.toml.orig, erroring if that is present in the source? That could also be added separate from this PR and that would be my preferred route.

Ok with me proceeding as per plan? Note I'm also trying in parallel to just fix the windows repo difference noted above.

@joshtriplett
Copy link
Member

@dekellum This shouldn't be specific to packages stored in a VCS, either. Why not just have a general check to avoid concurrent builds, rather than as a side effect of a lock on a particular file?

(I'm not objecting to using open_rw and similar to obtain locks; I'm just suggesting that the exclusion of concurrent builds should be handled in a way that applies to all builds, not just VCS builds.)

As for reserving filenames: right now, as far as I can tell, Cargo doesn't reserve any names, even those it uses (which is potentially a problem). Changing that could potentially introduce compatibility issues, as well as interesting questions. What happens, for instance, if someone wants to have a directory of files used in test cases?

I'm not trying to come up with arbitrary corner cases, and I'm not necessarily objecting to blocking the use of certain filenames, but we'd need to handle that carefully and think about it explicitly. What could potentially go wrong with starting to reserve filenames and prevent them from being in a Cargo crate?

@dekellum
Copy link
Contributor Author

This PR agrees I think: check_vcs_file_collision is run regardless of if the project is in git.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 28, 2018

The general check would be a /var/run (or ./target/lock) lock file for cargo. That's beyond the scope of both this PR and the original feature request (#5629)

@dekellum
Copy link
Contributor Author

Its probably best to defer this until #5820 is merged. I can then resolve the conflicts here and avoid the unfortunate windows specific test allowance.

@alexcrichton
Copy link
Member

@dekellum let's remove the synchronization logic here. As @joshtriplett mentioned this isn't buying us any security really and this file is largely going to only be informative to other tools downloading from the registry.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 30, 2018

I strongly disagree with "this isn't buying us any security really" part of your statement, for the reasons previously described. Does your preferred direction also include hamstringing (and/or subverting) the feature by allowing the same filename in the source?

@alexcrichton
Copy link
Member

@dekellum this is not a security feature, full stop. This is an informational feature for downloading crates from crates.io and allowing tools to opportunistically connect the source of a downloaded crate to the exact commit of a git repository.

Arbitrary tarballs can be uploaded to the registry at any time, there's practically no restrictions other than file size. We do not have a threat model here, nor are we trying to develop a threat model. Publishing and packaging are "best effort" when it comes to concurrent publication because it almost never reasonably happens, so the file locking here is not necessary.

@dekellum
Copy link
Contributor Author

Fun fact: When new people come to your project, they frequently attempt to emulate design choices made in the existing code. May I ask why there is the (somewhat inflexible) flock utility in cargo and why there is specific effort made to move the tarball into position only after it is completely built? Why aren't you clamoring to yank that sentry aspect out? And before your full stop, you haven't even illuminated me as to your position for reserving the file name/tarball path. I guess "help wanted" on the original feature request was a good bit more provisional than I originally imagined.

@dekellum
Copy link
Contributor Author

dekellum commented Jul 31, 2018

With my efforts in #5820 decisively thwarted, I think this PR is now similarly rendered immovable and you can close it if you wish, as in full stop.

I just want to leave one last note here for posterity: another way to achieve my initial (apparently not entirely embraced) objective of a reasonably reliable git sha1 hash recording, would be to change fn tar to use the git local repo for the file contents, at the same sha1, falling back to the working tree if-and-only-if an include'd file is not found in the repo at that sha1. This makes the process more like an atomic snapshot and no sentry file or other locking (here or on cargo master for that matter) would be required.

Best of luck.

@alexcrichton
Copy link
Member

@dekellum the flock utilities are there to handle concurrent invocations of cargo build and concurrent updates of the registry and to avoid having instances of Cargo stomp all over one another, it's not meant for intra-process synchronization at all. The tarball is only in place after it's built to avoid using a stale tarball if you ctrl-c one cargo package process and then cargo publish again. That's been a bug that's historically bitten Cargo. The sha1 hash for a git repo is always calculated, so it doesn't suffer from either of these problems.

I do not have an opinion about the file/tarball path option. It's fine to go either way, I was trying to be clear about a different aspect of this PR before moving on to that.

@dekellum
Copy link
Contributor Author

Closing in favor of #5886.

@dekellum dekellum closed this Aug 13, 2018
bors added a commit that referenced this pull request Aug 20, 2018
Generate .cargo_vcs_info.json and include in `cargo package` (take 2)

Implements #5629 and supersedes #5786, with the following improvements:

* With an upstream git2-rs fix (tracked #5823, validated and min version update in: #5858), no longer requires windows/unix split tests.

* Per review in #5786, drops file system output and locks for .cargo_vcs_info.json.

* Now uses serde `json!` macro for json production with appropriate escaping.

* Now includes a test of the output json format.

Possible followup:

* Per discussion in #5786, we could improve reliability of both the VCS dirty check and the git head sha1 recording by preferring (and/or warning otherwise) the local repository bytes of each source file, at the same head commit. This makes the process more appropriately like an atomic snapshot, with no sentry file or other locking required.  However given my lack of a window's license and dev setup, as exhibited by troubles of #5823, this feel intuitively like too much to attempt to change in one iteration here.  I accept the "best effort" concept of this feature as suggested in #5786, and think it acceptable progress if merged in this form.

@alexcrichton r?
@joshtriplett cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants