-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Git support for remote genmode #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, will be back soon
src/planning.rs
Outdated
@@ -310,6 +311,20 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> { | |||
|
|||
let data_attr = possible_crate_settings.and_then(|s| s.data_attr.clone()); | |||
|
|||
let git_data = if let Some(ref s) = own_package.source { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This could be more idiomatically written something like this
let git_data = own_package.source
.as_ref()
.filter(SourceId::is_git)
.map(|src| GitData {
remote: src.url().into_string().to_owned(),
// TODO: Explain why this unwrap is safe
// UNWRAP: i.e. like this
commit: src.precise().unwrap().to_owned()
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, I like that. Agreed, the current way is ugly.
I need to double-check the unwrap - I couldn't find any examples where Cargo didn't give me a precise commit ID, but I didn't see anything off-hand in the API to guarantee it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the internals use SourceId#precise
in this manner:
https://github.com/rust-lang/cargo/blob/805fbebc830174ad34feb774e019769d6fe7bbe9/src/cargo/sources/git/source.rs#L33-L36
let reference = match source_id.precise() {
Some(s) => GitReference::Rev(s.to_string()),
None => source_id.git_reference().unwrap().clone(),
};
This has some kind of relationship to GitReference
https://github.com/rust-lang/cargo/blob/805fbebc830174ad34feb774e019769d6fe7bbe9/src/cargo/core/source/source_id.rs#L57-L64
pub enum GitReference {
/// from a tag
Tag(String),
/// from the HEAD of a branch
Branch(String),
/// from a specific revision
Rev(String),
}
That all said, we might not need to handle all of these edge cases? What does cargo-vendor do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure on cargo-vendor, but I'd argue that we can only support Rev() or Tag() under Bazel. The new_git_repository
rule doesn't like branches, and is against Bazel philosophy, for not being able to uniquely identify code state somewhere else.
For thoroughness though, VendorSource
is the interesting bit, and it appears to largely leave the interesting bits up to Cargo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final note - Option filtering is nightly-only at the moment.
src/metadata.rs
Outdated
@@ -54,17 +56,18 @@ pub struct Package { | |||
pub license: Option<String>, | |||
pub license_file: Option<String>, | |||
pub description: Option<String>, | |||
pub source: Option<String>, | |||
pub source: Option<SourceId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably not a good idea to change this (the definition of Metadata or its components).
Many of the metadata objects are written to match the output of the cargo metadata
command, so you can't add or change fields in a way that cargo metadata
itself couldn't logically support.
I don't have a better idea at the moment though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take another pass at it. The reason it was changed was because Cargo is unable to go SourceId -> URL -> SourceId, so I was trying to preserve the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be possible to deserialize it directly back into a SourceId --
https://github.com/rust-lang/cargo/blob/805fbebc830174ad34feb774e019769d6fe7bbe9/src/cargo/core/source/source_id.rs#L356
(which would mean you could replace source: String
with source: SourceId
, though this would be a small avenue for a breakage if Cargo changes how the structure of SourceId).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it was a holdover from previous attempts. What I was ultimately after was figuring out some way of identifying where a project was checked out to, and the SourceId wasn't particularly helpful. I can identify if it's a Git dependency later in planning.rs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lied, I did need it. Using serde_json to pass it back and forth, because I need to preserve the precise()
to get the Git reference.
Tests are back to passing, though I'm expecting some issues with the smoke check yet. |
OK, this is turning out to be... Unfortunately interesting, and I could use some advice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the current approach in most respects, except that it requires additional stuff to be calculated in Metadata
. I'm hoping we can defer that to planning...
Rest of comments are just nits. Thanks for keeping at it!
src/planning.rs
Outdated
@@ -202,6 +205,9 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> { | |||
continue; | |||
} | |||
|
|||
let own_source_id = own_package.source.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think you can replace source.clone()
with source.as_ref()
as you don't actually need a clone of the string to deserialize, just a clone of the option. as_ref
will do that for you (it gives you a new Option<&T> that you can map/unwrap/whatever.
src/planning.rs
Outdated
@@ -310,6 +316,17 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> { | |||
|
|||
let data_attr = possible_crate_settings.and_then(|s| s.data_attr.clone()); | |||
|
|||
let is_git = own_source_id.as_ref().and_then(|s| Some(s.is_git())).unwrap_or(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: and_then(|s| Some(...))
is equivalent to map(|s| ...)
.
src/planning.rs
Outdated
@@ -310,6 +316,17 @@ impl<'fetcher> BuildPlannerImpl<'fetcher> { | |||
|
|||
let data_attr = possible_crate_settings.and_then(|s| s.data_attr.clone()); | |||
|
|||
let is_git = own_source_id.as_ref().and_then(|s| Some(s.is_git())).unwrap_or(false); | |||
let git_data = if is_git { | |||
let s = own_source_id.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is only safe due to a constraint on how is_git
is defined. Perhaps explain inline i.e. // UNWRAP: is_git implies own_source_id is present
.
Its not as good as a test, but its verifiable later at least.
@@ -6,11 +6,20 @@ DO NOT EDIT! Replaced on runs of cargo-raze | |||
|
|||
def {{workspace.gen_workspace_prefix}}_fetch_remote_crates(): | |||
{% for crate in crates %} | |||
{%- if crate.git_data %} | |||
native.new_git_repository( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we default to clone submodules as well: init_submodules = true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it off just to keep with the default behavior for new_git_repository
. Definitely not opposed to putting it in there, and I'm not sure that we could possibly break anything by doing so.
Now that I'm thinking about it, I'll go ahead and add that by default, we can revisit in the future if it turns out to be an issue, though I have no idea under what circumstances that could be the case (since Cargo has already figured out where the manifest directory is).
src/metadata.rs
Outdated
@@ -224,7 +226,8 @@ impl<'config> MetadataFetcher for CargoInternalsMetadataFetcher<'config> { | |||
for dependency in package.dependencies().iter() { | |||
dependencies.push(Dependency { | |||
name: dependency.name().to_string(), | |||
source: dependency.source_id().to_string(), | |||
// TODO(bspeice): Justify this unwrap | |||
source: serde_json::to_string(dependency.source_id()).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're relatively confident about this (if so, LGTM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am very confident about that, it wouldn't be terrible to serialize in a try!()
block before that point. I went ahead and .unwrap()
'ed it though because the error message would be something along the lines of "Cargo gave us a SourceId that it can't serialize; this is Cargo's fault."
src/metadata.rs
Outdated
@@ -59,6 +60,7 @@ pub struct Package { | |||
pub targets: Vec<Target>, | |||
pub features: HashMap<String, Vec<FeatureOrDependency>>, | |||
pub manifest_path: String, | |||
pub checkout_root: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This going to be a really frustrating comment -- apologies in advance.
I really don't want to touch any of these Metadata
structs if we can avoid it -- they're basically identical to the ones cargo metadata
is emitting. Modifying them here means that we'll somehow have to synthesize additional data when generating from a cargo metadata
invocation, and it blurs the line between generation of Metadata, and actual processing of that data.
Is it possible to defer calculating this until planning?
Its my fault that this is unclear -- I should put comments on these structs explaining how they were derived (e.g. experimentally from the output of cargo metadata
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not frustrating, no worries. I was just trying to locate the data near where it was created (i.e. not trying to re-engineer logic to build cargo_config
). I can regenerate the checkout_root
as long as the SourceId
is preserved and I know what the git checkout root is for Cargo.
The other thing to keep in mind, depending on how purist you're trying to be with matching cargo metadata
output, is that I'm technically modifying the SourceId
to serialize across to planning.
src/metadata.rs
Outdated
@@ -258,18 +261,24 @@ impl<'config> MetadataFetcher for CargoInternalsMetadataFetcher<'config> { | |||
features.insert(feature.clone(), features_or_dependencies.clone()); | |||
} | |||
|
|||
let git_root = self.cargo_config.git_path().into_path_unlocked(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is closely tied into my comment on the Package
struct.
Is it at all possible to do this without the cargo_config
? I ask because we might need an alternative to this for the CargoSubcommandMetadataFetcher
if we keep the checkout_root
arg in Package
-- ideally we can do this entirely without using cargo context and only using Metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is the only reason I needed to edit Package
. Ultimately, the reason you need cargo_config
is because otherwise I'm re-writing the logic that Cargo uses for determining the checkout directory without Cargo's input.
What would potentially work instead is something where I shell out to the system cargo
and that echos back the git_root
, but I'm hesitant to try and re-engineer Cargo.
OK, cleaned up the nits, etc. Thanks for the assistance there, I'm learning quickly what more idiomatic ways of dealing with uncertainty are. Now the final conundrum (I think) is this: Due to the way Bazel checks out git repositories (checking out the repository root) vs. the way Cargo approaches it (checking out the package root), I need to know during planning:
The current way I'm doing that is:
I'm open to suggestions, but I'll start the discussion here: I can try and reverse-engineer the logic for figuring out the |
OK, cleaned up the nits, etc. Thanks for the assistance there, I'm learning quickly what more idiomatic ways of dealing with uncertainty are. Now the final conundrum (I think) is this: Due to the way Bazel checks out git repositories (checking out the repository root) vs. the way Cargo approaches it (checking out the package root), I need to know during planning:
The current way I'm doing that is:
I'm open to suggestions for how to address, but I'll start the discussion here: I can try and reverse-engineer the logic for figuring out the Alternately, if there were a |
Apologies for the lack of responsiveness from me on this. Its still on my radar, but I need to sit down and think on it to make a solid decision. Stream of consciousness response in the meantime: Do you have a particular crate in mind that doesn't work right using only the normal metadata? Ok, I just read up on From my perspective, rust-lang/cargo#1734 is not likely to cause much harm. In fact, I'm not confident that it is relevant: the particular place that they extract the subdirectory of the One additional factor that I just realized (but you already may be aware of): They're doing this "determination" at packaging time, not at clone time. For example, if you clone futures-rs, and try to $ git clone https://github.com/rust-lang-nursery/futures-rs
Cloning into 'futures-rs'...
remote: Counting objects: 25476, done.
remote: Total 25476 (delta 0), reused 0 (delta 0), pack-reused 25475
Receiving objects: 100% (25476/25476), 13.00 MiB | 25.21 MiB/s, done.
Resolving deltas: 100% (17271/17271), done.
$ cd futures-rs
$ cargo package
error: manifest path `/usr/local/google/home/acmcarther/projects/futures-rs/Cargo.toml` is a virtual manifest, but this command requires running against an actual package in this workspace ... But, it does support doing it at build time too if you include it something like [dependencies]
futures = { git = "https://github.com/rust-lang-nursery/futures-rs" } My guess is that they're processing the virtual manifest at the root, which is something we could do as well (something we should do if we want to support this correctly).... but I'd be inclined to punt on that (perhaps in a subsequent PR). |
Any crate that has the crate root somewhere that isn't the repository root has to be handled - this shows up for things like That said, I think Cargo doesn't use particularly difficult rules for determining the root checkout directory, it was more a "if Cargo makes the information available, get it straight from the horse's mouth." I'm definitely OK punting on handling the full virtual manifest (i.e. not particularly worrying about git versions of My plan at this point is to write a separate function for figuring out the Git root that doesn't rely on |
Accidentally closed, working on fixing and then will re-open. ...and I was an idiot and force-pushed the branch to oblivion. Can recover via Had a flash of inspiration - to get the "git root" for packages, I can just walk up the directory tree until I find a |
This enables support for basic Git repositories. A lot of examples are currently broken because we assume the Package.root() is the same as the repository root.
As a specific example, we can't include
rand
from Git. Therand
base repository has apath
dependency onrand-core
. We correctly resolve the Git repository, but when we resolve the library target path, we subtract out the package root in order to make sure that the BUILD file is sane. However, because the package root is a directory deep in the repository, we effectively are including therand
library twice (once as rootrand
, once asrand-core
) leading to issues becauserand-core
can't be found.A similar issue shows up with futures, but I think that one may end up being slightly more complicated because of the Cargo workspace.