-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use gitoxide
for list_files_git
#13592
Conversation
It contains the feature required to get a directory traversal.
014a64b
to
7987665
Compare
7987665
to
6e4c21a
Compare
With a traditional walk, `.git` will be picked up, and so will be ignored directories. This commit also doesn't give submodules special treatment - instead it just tries to open directories as repositories, or walks them if that fails.
This is the case for the git2 implementation naturally, but as `gitoxide` doesn't yet have a dirwalk iterator, it's much less intuitive here.
6e4c21a
to
9bf9149
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once CI has passed, I think this is ready for review. I left plenty of comments in this PR, to associate some of my 'Review Notes' with code.
Overall, I am really excited about this PR and can say that it greatly improved gitoxide
in the process. It's quite amazing to see what Cargo does to list files as well :).
src/cargo/sources/path.rs
Outdated
.gctx | ||
.cli_unstable() | ||
.gitoxide | ||
.map_or(false, |features| features.fetch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I don't know if it's great to above fetch
. For now I left it as you might decide to use gix
by default here. If not, one could add another feature toggle like list
, even though I fear very low adoption/testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate flag might be of use if we want to stabilize piece-meal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is implemented now, and I called the flag 'list-files'.
let rel = rel.replace(r"\", "/"); | ||
match repo.find_submodule(&rel).and_then(|s| s.open()) { | ||
warn!(" found directory {}", file_path.display()); | ||
match git2::Repository::open(&file_path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the intent is to add the directory, submodule or not, it's better to open it as Repository to avoid picking up .git
folders. This fix I applied after noticing this in the gitoxide
implementation.
.emit_ignored(None) | ||
.emit_tracked(true) | ||
.recurse_repositories(false) | ||
.symlinks_to_directories_are_ignored_like_directories(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option I added specifically to support a difference in behaviour when comparing libgit2
to Git. Symlinks to directories are treated as directories by the .gitignore
machinery (on Unix only in libgit2
, all platforms in gitoxide
). Git itself always treats symlinks as files.
src/cargo/sources/path.rs
Outdated
let mut pattern = include.clone(); | ||
pattern | ||
.to_mut() | ||
.insert_str(0, if include.is_empty() { ":!" } else { ":!/" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pathspec prefix is short for 'exclude from top', with the 'top' being required to avoid the CWD to affect the pathspec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the :
syntax and where can I read more about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, is it a path separator? Why are we using strings with separators rather than Vec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pathspec
syntax is document in the Git glossary of all places.
With a top-level Cargo.toml
file, this code produces two pathspecs: :/
and :!/target/
, with a sub-package it would look like :/sub-package
and :!/sub-package/target/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a Rust API using stringly-typed values based on a command-line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see what you are getting at.
I didn't think of innovating here, following the API used by libgit2
and git2
respectively. Having strings for pathspecs runs deep, and they are treated more like paths or regex or globs rather than a data structure. The datastructure itself also isn't particularly well suited to be constructed by hand, as git settings affect it (like case-sensitivity). It's quite complex, but I can imagine to tackle this by adding a trait like TryIntoPathspec
which then works for strings as well as for hand-created pathspecs.
I see how being able to not have to go through the parser here would be beneficial though, so I have put this into the roadmap.
(Since it's not worse than git2
which is also stringly typed, I hope this won't have to block this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, thank you for the explanation!
For me, it is more of trying to make the intent of the code clearer. I'll mark this as resolved but have some other thoughts for improving it with what we have now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, was tempted to encourage pathspec
being added to the variables but the leading let pathspec =
helps. Still easy to miss the structure these variables are intended to uphold. I feel like I could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely quite complex code, with a bunch of variables and inter-dependencies to make it all line up. This could already be improved a lit if format!()
could be used, but since paths are involved, that's not possible :/.
Maybe the pathspec-code can be refactored into something better nonetheless.
// Do not trust what's recorded in the index, enforce checking the disk. | ||
// This traversal is not part of a `status()`, and tracking things in `target/` | ||
// is rare. | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a possible source of failures in the git2
version. The index can generally not be trusted to represent the working tree unless a status
call has set the UPTODATE
flag on the entry. This isn't currently happening, and fixing this would be more complicated than it's worth.
Maybe another argument to switch this part over to gix
.
src/cargo/sources/path.rs
Outdated
// This could be a submodule, or a sub-repository. In any case, we prefer to walk | ||
// it with git-support to leverage ignored files and to avoid pulling in entire | ||
// .git repositories. | ||
match gix::open_opts(&file_path, gix::open::Options::isolated()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for 'isolated' open (to avoid being affected by user or system settings) as these shouldn't be relevant for what we are doing. This makes sure of that, and is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the user/system level gitignore configuration is ignored? I believe that would be a change from the current model.
Some people put editor-specific ignore configuration in their user-level git configuration. Cargo should respect that when publishing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great catch, I don't know why I wasn't seeing this 🤦♂️.
It's fixed now, which also simplifies the code quite a bit as one can now discover and open the repository without altering the default configuration.
@@ -689,6 +689,7 @@ fn package_symlink_to_submodule() { | |||
project | |||
.cargo("package --no-verify -v") | |||
.with_stderr_contains("[ARCHIVING] submodule/Makefile") | |||
.with_stderr_does_not_contain("[ARCHIVING] submodule-link/.git/config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to assure that in sub-repositories, we never pick up .git
directories.
_ = fs::write(repo.workdir().unwrap().join(".gitignore"), "target/").unwrap(); | ||
git::add(&repo); | ||
git::commit(&repo); | ||
_ = fs::write(p.build_dir().join("untracked.txt"), "").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was added to further stress the gitoxide
implementation, which I could otherwise have implemented wrongly. But even if that wasn't the case, it's good to see that only tracked files in target
will be picked up.
* remove renovate group as it's not needed anymore * repository discovery will open with isolation
For pathspecs, plan to provide a type-safe version of pathspec instantiation, as suggested [in this comment](rust-lang/cargo#13592 (comment)).
@arlosi in case you missed this. |
Just for clarity, from my point of view all issues brought up so far have been addressed. Is this PR blocked on me nonetheless? |
For myself, I don't fully feel confident in answering a lot of the questions you raised (or identifying issues in this code). I can say "this is unstable, just merge" but I worry we might lose track of a lot of this. Would appreciate a look for someone else, especially with a lot of experience with this code. |
I'll find a time reviewing this. |
I confirmed this PR does fix the issue I was having related to #10150 for the missing |
f8290cf
to
7b95523
Compare
It's not necessary, and adds noise.
7b95523
to
ac68aa1
Compare
Co-authored-by: Arlo Siemsen <arkixml@gmail.com>
Top-level pathspecs are needed to assure they are not affected by the CWD. The trailing `/` in `'target` is needed to assure excluded items are in a folder, and that only entries in that folder are extracted from the index.
Thanks! Looking forward to making this the default. @bors r+ |
☀️ Test successful - checks-actions |
Update cargo 13 commits in d438c80c45c24be676ef5867edc79d0a14910efe..a510712d05c6c98f987af24dd73cdfafee8922e6 2024-03-19 16:11:22 +0000 to 2024-03-25 03:45:54 +0000 - Remove unnecessary test (rust-lang/cargo#13637) - Use `gitoxide` for `list_files_git` (rust-lang/cargo#13592) - fix: Warn on -Zlints (rust-lang/cargo#13632) - feat: Add a basic linting system (rust-lang/cargo#13621) - docs: remove untrue TODO for `native_dirs` (rust-lang/cargo#13631) - refactor(testsuite): Rename lints to lints_table (rust-lang/cargo#13627) - Fix debuginfo strip when using `--target` (rust-lang/cargo#13618) - refactor(toml): Push diagnostic complexity on annotate-snippets (rust-lang/cargo#13619) - Fix publish script due to crates.io CDN change (rust-lang/cargo#13614) - fix(alias): dont panic when resolving an empty alias (rust-lang/cargo#13613) - Update annotate snippets (rust-lang/cargo#13609) - refactor(vendor): tiny not important refactors (rust-lang/cargo#13610) - feat: Report some dependency changes on any command (rust-lang/cargo#13561) r? ghost
Update cargo 13 commits in d438c80c45c24be676ef5867edc79d0a14910efe..a510712d05c6c98f987af24dd73cdfafee8922e6 2024-03-19 16:11:22 +0000 to 2024-03-25 03:45:54 +0000 - Remove unnecessary test (rust-lang/cargo#13637) - Use `gitoxide` for `list_files_git` (rust-lang/cargo#13592) - fix: Warn on -Zlints (rust-lang/cargo#13632) - feat: Add a basic linting system (rust-lang/cargo#13621) - docs: remove untrue TODO for `native_dirs` (rust-lang/cargo#13631) - refactor(testsuite): Rename lints to lints_table (rust-lang/cargo#13627) - Fix debuginfo strip when using `--target` (rust-lang/cargo#13618) - refactor(toml): Push diagnostic complexity on annotate-snippets (rust-lang/cargo#13619) - Fix publish script due to crates.io CDN change (rust-lang/cargo#13614) - fix(alias): dont panic when resolving an empty alias (rust-lang/cargo#13613) - Update annotate snippets (rust-lang/cargo#13609) - refactor(vendor): tiny not important refactors (rust-lang/cargo#13610) - feat: Report some dependency changes on any command (rust-lang/cargo#13561) r? ghost
Related to #10150.
Tasks
gix
to v0.60git2
andgitoxide
)list_files_git
to usegitoxide
if it is enabled as feature.gix
with necessary updatesReview Notes
As this PR has come a long way, I decided to keep a few of the steps leading up to the final state, showing the PR's evolution in the hope it helps the review.
gitoxide
for this without a switch? I don't thinkit will cause more trouble than
git2
, and if there is an issue I will fix it with priority..git/*
folder contained in the submodule which is ignored by the walk, i.e.submodule/*
does not contain it, butsubmodule-link/*
does. This is fixed in the gitoxide version, and thegit2
version.Remarks
gitoxide
, which I greatly appreciate.git2
as it's API for many things leads to pretty idiomatic code, and sometimes I really have to work to match it. The example here is the initialdirwalk()
method which requires a delegate as it doesn't just collect into aVec
likegit2
does (for good reason). Turning that into an iterator viadirwalk_iter()
makes it far more usable, and will definitely be good for performance as the dirwalk work is offloaded into its own thread.