Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Git support for remote genmode #31

Closed
wants to merge 0 commits into from
Closed

Add Git support for remote genmode #31

wants to merge 0 commits into from

Conversation

bspeice
Copy link
Contributor

@bspeice bspeice commented Mar 17, 2018

This enables support for basic Git repositories. A lot of examples are currently broken because we assume the Package.root() is the same as the repository root.

As a specific example, we can't include rand from Git. The rand base repository has a path dependency on rand-core. We correctly resolve the Git repository, but when we resolve the library target path, we subtract out the package root in order to make sure that the BUILD file is sane. However, because the package root is a directory deep in the repository, we effectively are including the rand library twice (once as root rand, once as rand-core) leading to issues because rand-core can't be found.

A similar issue shows up with futures, but I think that one may end up being slightly more complicated because of the Cargo workspace.

@bspeice bspeice changed the title Add Git support for remote Add Git support for remote genmode Mar 17, 2018
Copy link
Member

@acmcarther acmcarther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, will be back soon

src/planning.rs Outdated
@@ -310,6 +311,20 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> {

let data_attr = possible_crate_settings.and_then(|s| s.data_attr.clone());

let git_data = if let Some(ref s) = own_package.source {
Copy link
Member

@acmcarther acmcarther Mar 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This could be more idiomatically written something like this

let git_data = own_package.source
  .as_ref()
  .filter(SourceId::is_git)
  .map(|src| GitData {
    remote: src.url().into_string().to_owned(),
    // TODO: Explain why this unwrap is safe
    // UNWRAP: i.e. like this
    commit: src.precise().unwrap().to_owned()
  })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, I like that. Agreed, the current way is ugly.
I need to double-check the unwrap - I couldn't find any examples where Cargo didn't give me a precise commit ID, but I didn't see anything off-hand in the API to guarantee it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the internals use SourceId#precise in this manner:
https://github.com/rust-lang/cargo/blob/805fbebc830174ad34feb774e019769d6fe7bbe9/src/cargo/sources/git/source.rs#L33-L36

        let reference = match source_id.precise() {
            Some(s) => GitReference::Rev(s.to_string()),
            None => source_id.git_reference().unwrap().clone(),
        };

This has some kind of relationship to GitReference
https://github.com/rust-lang/cargo/blob/805fbebc830174ad34feb774e019769d6fe7bbe9/src/cargo/core/source/source_id.rs#L57-L64

pub enum GitReference {
    /// from a tag
    Tag(String),
    /// from the HEAD of a branch
    Branch(String),
    /// from a specific revision
    Rev(String),
}

That all said, we might not need to handle all of these edge cases? What does cargo-vendor do?

Copy link
Contributor Author

@bspeice bspeice Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure on cargo-vendor, but I'd argue that we can only support Rev() or Tag() under Bazel. The new_git_repository rule doesn't like branches, and is against Bazel philosophy, for not being able to uniquely identify code state somewhere else.

For thoroughness though, VendorSource is the interesting bit, and it appears to largely leave the interesting bits up to Cargo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final note - Option filtering is nightly-only at the moment.

src/metadata.rs Outdated
@@ -54,17 +56,18 @@ pub struct Package {
pub license: Option<String>,
pub license_file: Option<String>,
pub description: Option<String>,
pub source: Option<String>,
pub source: Option<SourceId>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably not a good idea to change this (the definition of Metadata or its components).

Many of the metadata objects are written to match the output of the cargo metadata command, so you can't add or change fields in a way that cargo metadata itself couldn't logically support.

I don't have a better idea at the moment though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another pass at it. The reason it was changed was because Cargo is unable to go SourceId -> URL -> SourceId, so I was trying to preserve the original.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be possible to deserialize it directly back into a SourceId --
https://github.com/rust-lang/cargo/blob/805fbebc830174ad34feb774e019769d6fe7bbe9/src/cargo/core/source/source_id.rs#L356

(which would mean you could replace source: String with source: SourceId, though this would be a small avenue for a breakage if Cargo changes how the structure of SourceId).

Copy link
Contributor Author

@bspeice bspeice Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it was a holdover from previous attempts. What I was ultimately after was figuring out some way of identifying where a project was checked out to, and the SourceId wasn't particularly helpful. I can identify if it's a Git dependency later in planning.rs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lied, I did need it. Using serde_json to pass it back and forth, because I need to preserve the precise() to get the Git reference.

@bspeice
Copy link
Contributor Author

bspeice commented Mar 20, 2018

Tests are back to passing, though I'm expecting some issues with the smoke check yet.

@bspeice
Copy link
Contributor Author

bspeice commented Mar 21, 2018

OK, this is turning out to be... Unfortunately interesting, and I could use some advice.
In cargo-vendor, there's a call to package.get() which will ultimately end up calling the download() method to retrieve a package. This is incredibly useful because Cargo handles the cases where repository_root != package_root.
However, it means we have to figure out the path in the event of Git and GenMode::Remote. Current plan of action is to add the SourceId and RazeSettings to planning::produce_targets(), should I be taking a different approach?

Copy link
Member

@acmcarther acmcarther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the current approach in most respects, except that it requires additional stuff to be calculated in Metadata. I'm hoping we can defer that to planning...

Rest of comments are just nits. Thanks for keeping at it!

src/planning.rs Outdated
@@ -202,6 +205,9 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> {
continue;
}

let own_source_id = own_package.source.clone()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think you can replace source.clone() with source.as_ref() as you don't actually need a clone of the string to deserialize, just a clone of the option. as_ref will do that for you (it gives you a new Option<&T> that you can map/unwrap/whatever.

src/planning.rs Outdated
@@ -310,6 +316,17 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> {

let data_attr = possible_crate_settings.and_then(|s| s.data_attr.clone());

let is_git = own_source_id.as_ref().and_then(|s| Some(s.is_git())).unwrap_or(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: and_then(|s| Some(...)) is equivalent to map(|s| ...).

src/planning.rs Outdated
@@ -310,6 +316,17 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> {

let data_attr = possible_crate_settings.and_then(|s| s.data_attr.clone());

let is_git = own_source_id.as_ref().and_then(|s| Some(s.is_git())).unwrap_or(false);
let git_data = if is_git {
let s = own_source_id.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is only safe due to a constraint on how is_git is defined. Perhaps explain inline i.e. // UNWRAP: is_git implies own_source_id is present.

Its not as good as a test, but its verifiable later at least.

@@ -6,11 +6,20 @@ DO NOT EDIT! Replaced on runs of cargo-raze

def {{workspace.gen_workspace_prefix}}_fetch_remote_crates():
{% for crate in crates %}
{%- if crate.git_data %}
native.new_git_repository(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we default to clone submodules as well: init_submodules = true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it off just to keep with the default behavior for new_git_repository. Definitely not opposed to putting it in there, and I'm not sure that we could possibly break anything by doing so.

Now that I'm thinking about it, I'll go ahead and add that by default, we can revisit in the future if it turns out to be an issue, though I have no idea under what circumstances that could be the case (since Cargo has already figured out where the manifest directory is).

src/metadata.rs Outdated
@@ -224,7 +226,8 @@ impl<'config> MetadataFetcher for CargoInternalsMetadataFetcher<'config> {
for dependency in package.dependencies().iter() {
dependencies.push(Dependency {
name: dependency.name().to_string(),
source: dependency.source_id().to_string(),
// TODO(bspeice): Justify this unwrap
source: serde_json::to_string(dependency.source_id()).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're relatively confident about this (if so, LGTM)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I am very confident about that, it wouldn't be terrible to serialize in a try!() block before that point. I went ahead and .unwrap()'ed it though because the error message would be something along the lines of "Cargo gave us a SourceId that it can't serialize; this is Cargo's fault."

src/metadata.rs Outdated
@@ -59,6 +60,7 @@ pub struct Package {
pub targets: Vec<Target>,
pub features: HashMap<String, Vec<FeatureOrDependency>>,
pub manifest_path: String,
pub checkout_root: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This going to be a really frustrating comment -- apologies in advance.

I really don't want to touch any of these Metadata structs if we can avoid it -- they're basically identical to the ones cargo metadata is emitting. Modifying them here means that we'll somehow have to synthesize additional data when generating from a cargo metadata invocation, and it blurs the line between generation of Metadata, and actual processing of that data.

Is it possible to defer calculating this until planning?

Its my fault that this is unclear -- I should put comments on these structs explaining how they were derived (e.g. experimentally from the output of cargo metadata).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not frustrating, no worries. I was just trying to locate the data near where it was created (i.e. not trying to re-engineer logic to build cargo_config). I can regenerate the checkout_root as long as the SourceId is preserved and I know what the git checkout root is for Cargo.

The other thing to keep in mind, depending on how purist you're trying to be with matching cargo metadata output, is that I'm technically modifying the SourceId to serialize across to planning.

src/metadata.rs Outdated
@@ -258,18 +261,24 @@ impl<'config> MetadataFetcher for CargoInternalsMetadataFetcher<'config> {
features.insert(feature.clone(), features_or_dependencies.clone());
}

let git_root = self.cargo_config.git_path().into_path_unlocked();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is closely tied into my comment on the Package struct.

Is it at all possible to do this without the cargo_config? I ask because we might need an alternative to this for the CargoSubcommandMetadataFetcher if we keep the checkout_root arg in Package -- ideally we can do this entirely without using cargo context and only using Metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the only reason I needed to edit Package. Ultimately, the reason you need cargo_config is because otherwise I'm re-writing the logic that Cargo uses for determining the checkout directory without Cargo's input.

What would potentially work instead is something where I shell out to the system cargo and that echos back the git_root, but I'm hesitant to try and re-engineer Cargo.

@bspeice
Copy link
Contributor Author

bspeice commented Mar 22, 2018

OK, cleaned up the nits, etc. Thanks for the assistance there, I'm learning quickly what more idiomatic ways of dealing with uncertainty are.

Now the final conundrum (I think) is this: Due to the way Bazel checks out git repositories (checking out the repository root) vs. the way Cargo approaches it (checking out the package root), I need to know during planning:

  1. What the checkout_root is
  2. The SourceId so that I can tell both if this is a Git package, and thus where the final resting place of the lib.rs file is.

The current way I'm doing that is:

  1. Use the cargo_config during metadata stage, which is understandably not ideal, but means that I can rely on Cargo directly to tell me where it will check things out at
  2. Serialize via serde_json, which modifies the output of metadata

I'm open to suggestions, but I'll start the discussion here: I can try and reverse-engineer the logic for figuring out the git_root, but I don't want to get caught off guard because something like rust-lang/cargo#1734 gets merged.
Alternately, if there were a cargo command that printed out the git_root, I'm free to call that during planning and leave the metadata alone.

@bspeice
Copy link
Contributor Author

bspeice commented Mar 22, 2018

OK, cleaned up the nits, etc. Thanks for the assistance there, I'm learning quickly what more idiomatic ways of dealing with uncertainty are.

Now the final conundrum (I think) is this: Due to the way Bazel checks out git repositories (checking out the repository root) vs. the way Cargo approaches it (checking out the package root), I need to know during planning:

  1. What the checkout_root is
  2. The SourceId so that I can tell both if this is a Git package, and thus where the final resting place of the lib.rs file is.

The current way I'm doing that is:

  1. Use the cargo_config during metadata stage, which is understandably not ideal, but means that I can rely on Cargo directly to tell me where it will check things out at
  2. Serialize via serde_json, which modifies the output of metadata

I'm open to suggestions for how to address, but I'll start the discussion here:

I can try and reverse-engineer the logic for figuring out the git_root, but I don't want to get caught off guard because something like rust-lang/cargo#1734 gets merged.

Alternately, if there were a cargo command that printed out the git_root, I'm free to call that during planning and leave the metadata alone. However, this requires going through and adding a new Cargo subcommand, plus assuming everyone uses the minimum version of Cargo as required.

@acmcarther
Copy link
Member

Apologies for the lack of responsiveness from me on this. Its still on my radar, but I need to sit down and think on it to make a solid decision. Stream of consciousness response in the meantime:

Do you have a particular crate in mind that doesn't work right using only the normal metadata? futures-rs seems likely, given its workspace structure. I'm guessing that cargo-vendor doesn't have this issue because they have Cargo clone the crates into ~/.cargo/ or whatever, and copy the files out of that directory.


Ok, I just read up on cargo::util::Config, and I think I understand what you're doing with the paths. For issues like this, my general attitude has been that I'd just copy Cargo's behavior (and regret/handle it after they changed it). I did that once already for this kind_to_kinds function.

From my perspective, rust-lang/cargo#1734 is not likely to cause much harm. In fact, I'm not confident that it is relevant: the particular place that they extract the subdirectory of the .crate file into shouldn't have much bearing on how they determine what the subpath into the .crate file is.

One additional factor that I just realized (but you already may be aware of): They're doing this "determination" at packaging time, not at clone time. For example, if you clone futures-rs, and try to package or publish from the root, you see this:

$ git clone https://github.com/rust-lang-nursery/futures-rs
Cloning into 'futures-rs'...
remote: Counting objects: 25476, done.
remote: Total 25476 (delta 0), reused 0 (delta 0), pack-reused 25475
Receiving objects: 100% (25476/25476), 13.00 MiB | 25.21 MiB/s, done.
Resolving deltas: 100% (17271/17271), done.
$ cd futures-rs
$ cargo package
error: manifest path `/usr/local/google/home/acmcarther/projects/futures-rs/Cargo.toml` is a virtual manifest, but this command requires running against an actual package in this workspace

... But, it does support doing it at build time too if you include it something like

[dependencies]
futures = { git = "https://github.com/rust-lang-nursery/futures-rs" }

My guess is that they're processing the virtual manifest at the root, which is something we could do as well (something we should do if we want to support this correctly).... but I'd be inclined to punt on that (perhaps in a subsequent PR).

@bspeice
Copy link
Contributor Author

bspeice commented Mar 30, 2018

Any crate that has the crate root somewhere that isn't the repository root has to be handled - this shows up for things like rand where rand-core is in the same repo, but a different folder. You are correct though, this isn't an issue when we let Cargo handle the checkout, because all of that is resolved. When Bazel becomes responsible, because it can only check out the repo root (as far as I know), we need to introduce special behavior.

That said, I think futures-rs is definitely a good benchmark for being considered essentially feature complete.

Cargo doesn't use particularly difficult rules for determining the root checkout directory, it was more a "if Cargo makes the information available, get it straight from the horse's mouth."

I'm definitely OK punting on handling the full virtual manifest (i.e. not particularly worrying about git versions of futures-core, etc.).

My plan at this point is to write a separate function for figuring out the Git root that doesn't rely on cargo, but I think after that's done, this is essentially ready? (Also need to get the example moved over from cargo-raze-examples)

@bspeice bspeice closed this Mar 30, 2018
@bspeice
Copy link
Contributor Author

bspeice commented Mar 30, 2018

Accidentally closed, working on fixing and then will re-open.


...and I was an idiot and force-pushed the branch to oblivion. Can recover via git reflog, but will need to open a new PR.

Had a flash of inspiration - to get the "git root" for packages, I can just walk up the directory tree until I find a .git folder. No need to deal with anything Cargo related at all. I have the code prepared, but I built it on top of #35, so will open a PR once that's been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants