-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use git partial clone and worktree to reduce network/file io #5412
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5412 +/- ##
==========================================
+ Coverage 25.74% 25.79% +0.05%
==========================================
Files 449 450 +1
Lines 48452 48606 +154
==========================================
+ Hits 12474 12539 +65
- Misses 35005 35085 +80
- Partials 973 982 +9 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
8f6b6c1
to
e6a8296
Compare
…mplementations Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
…git.Repo Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
5d79c69
to
2a0ee56
Compare
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
… cloning Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
/review |
PR AnalysisMain theme
PR summary
Type of PR
PR Feedback:General suggestions
Code feedback
Security concerns:
|
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
// It is a separate checkout of the repository. | ||
// It is used to make changes to the repository without affecting the main repository. | ||
// Worktree always does checkout with the detached HEAD, so it doesn't affect the main repository. | ||
type Worktree interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to restrict operation at worktree.
Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
// To avoid the git operation on copied source. | ||
// TODO: do not copy the .git directory to improve the performance. | ||
if err := os.RemoveAll(filepath.Join(dest, ".git")); err != nil { | ||
fmt.Fprintf(lw, "Unable to remove the .git directory from the copied deploy source (%v)\n", err) | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a tar
trick to avoid .git
copy and remove. I think tar is available everywhere and fast enough compared to cp
tar -cpP --exclude=.git -C "$source_dir" . | tar -xpP -C "$target_dir"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the alpine as the base image for piped, so I checked the tar command in the piped image.
First, the tar command exists. But, the options differ from the GNU/BSD tar.
For example, short flags -p
and -P
do not exist.
IMO, it's not good to depend on this tar behavior because users may run the piped on a non-alpine-based OS.
WDYT?
Note: I checked tar --help
in the piped container image. I paste the result below.
$ docker run --rm -it --entrypoint sh ghcr.io/pipe-cd/piped:v0.50.0-23-g8f23da4@sha256:88ec60ebf2b7e3131ff38b1af4a043117ef688bc636ddd95afd6bd368293651b
/ $ tar --help
BusyBox v1.36.1 (2024-06-10 07:11:47 UTC) multi-call binary.
Usage: tar c|x|t [-ZzJjahmvokO] [-f TARFILE] [-C DIR] [-T FILE] [-X FILE] [LONGOPT]... [FILE]...
Create, extract, or list files from a tar file
c Create
x Extract
t List
-f FILE Name of TARFILE ('-' for stdin/out)
-C DIR Change to DIR before operation
-v Verbose
-O Extract to stdout
-m Don't restore mtime
-o Don't restore user:group
-k Don't replace existing files
-Z (De)compress using compress
-z (De)compress using gzip
-J (De)compress using xz
-j (De)compress using bzip2
--lzma (De)compress using lzma
-a (De)compress based on extension
-h Follow symlinks
-T FILE File with names to include
-X FILE File with glob patterns to exclude
--exclude PATTERN Glob pattern to exclude
--overwrite Replace existing files
--strip-components NUM NUM of leading components to strip
--no-recursion Don't descend in directories
--numeric-owner Use numeric user:group
--no-same-permissions Don't restore access permissions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the version without -P, the more universal solution
tar --exclude='.git' -cf - source_dir | tar -xf - -C destination_dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OKay, thank you for providing this. I'll use this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried some tar options and finally reached this commit.
I tested it with busybox's tar (the same as alpine's) and GNU tar.
c5216c9
Overall LGTM. Just left a nits, want to give it a shot 👍 |
…rformance Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGreatTM 💯
This could be the patch we're looking for 🥇
@@ -263,6 +263,8 @@ func (d *detector) loadHeadServiceManifest(app *model.Application, repo git.Repo | |||
if err != nil { | |||
return provider.ServiceManifest{}, fmt.Errorf("failed to copy the cloned git repository (%w)", err) | |||
} | |||
defer repo.Clean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Warashi
it seems that the dir
is removed after executing the loadHeadServiceManifest
So we might not call repo.Clean()
. WDYT?
// We have to copy repository into another directory because
// decrypting the sealed secrets or attaching files might change the git repository.
if attachmentUsed || encryptionUsed {
dir, err := os.MkdirTemp("", "detector-git-processing")
if err != nil {
return nil, fmt.Errorf("failed to prepare a temporary directory for git repository (%w)", err)
}
defer os.RemoveAll(dir)
repo, err = repo.Copy(filepath.Join(dir, "repo"))
if err != nil {
return nil, fmt.Errorf("failed to copy the cloned git repository (%w)", err)
}
defer repo.Clean()
repoDir = repo.GetPath()
appDir = filepath.Join(repoDir, app.GitPath.Path)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call repo.Clean()
here because the 'repo.Clean()' not only does the directory deletion.
IMO, we should call some methods like Clean()
or Close()
everywhere because the caller doesn't know what happens with such methods. The caller only knows the cleanup methods exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call repo.Clean() here because the 'repo.Clean()' not only does the directory deletion.
I got it. The git worktree
needs additional operation other than deleting the directory to keep the consistency of the worktree as you said.
IMO, we should call some methods like Clean() or Close() everywhere because the caller doesn't know what happens with such methods. The caller only knows the cleanup methods exist.
Also I agree with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little off topic, but the detector reuses the one repo. loadHeadServiceManifest
also uses it as an arg. So I feel a little bit confused about cleaning at this point.
But I noticed that this section is used temporarily and different from the actual repo passed as arg. 🙏
Copy(dest string) (Worktree, error) | ||
CopyToModify(dest string) (Repo, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[IMO] How about using the Worktree
and Copy
?
Worktree(dest string) (Worktree, error)
Copy(dest string) (Repo, error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds clear, but I want to leave here as-is.
I want to clear when to use the CopyToModify
method with its naming.
Almost all of implementation, we don't need modification of git commits/histories, so we should not use CopyToModify, and we can know from its naming.
If we rename them as Worktree and Copy, some implementation may call Copy method mistakenly even if it's enough to call the Worktree method.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all of implementation, we don't need modification of git commits/histories, so we should not use CopyToModify, and we can know from its naming.
I got your point! I agree with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Use git worktree and partial clone to reduce network io Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Add defer statement to clean up cloned git repositories in detector implementations Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Change the repo.Copy to use worktree and implement CopyToModify Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Add tests for Copy and CopyToModify methods in repo Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * repo.Copy and related methods updated to use git.Worktree instead of git.Repo Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * MockRepo.Copy method updated to return git.Worktree instead of git.Repo Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Update CopyToModify method to clone repository using git clone command Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Test: Update TestCopy to use repo.Copy instead of CopyToModify Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Fix comment in CopyToModify to clarify remote URL setting after local cloning Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Fetch the latest changes from remote after local cloning Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Remove .git directory from copied deploy source to avoid the git ops Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Update TestCopyToModify to use a mock remote directory for testing Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> * Copy deploy source using tar to exclude .git directory and improve performance Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> --------- Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> Signed-off-by: pipecd-bot <pipecd.dev@gmail.com>
…5419) * Use git worktree and partial clone to reduce network io * Add defer statement to clean up cloned git repositories in detector implementations * Change the repo.Copy to use worktree and implement CopyToModify * Add tests for Copy and CopyToModify methods in repo * repo.Copy and related methods updated to use git.Worktree instead of git.Repo * MockRepo.Copy method updated to return git.Worktree instead of git.Repo * Update CopyToModify method to clone repository using git clone command * Test: Update TestCopy to use repo.Copy instead of CopyToModify * Fix comment in CopyToModify to clarify remote URL setting after local cloning * Fetch the latest changes from remote after local cloning * Remove .git directory from copied deploy source to avoid the git ops * Update TestCopyToModify to use a mock remote directory for testing * Copy deploy source using tar to exclude .git directory and improve performance --------- Signed-off-by: Shinnosuke Sawada-Dazai <shin@warashi.dev> Signed-off-by: pipecd-bot <pipecd.dev@gmail.com> Co-authored-by: Shinnosuke Sawada-Dazai <shin@warashi.dev>
What this PR does:
cp -rf
butgit worktree add
to Copy the git.RepoWhy we need it:
I want to reduce network/file io
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: