Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git repo cache #7424

Closed
wants to merge 24 commits into from
Closed

Git repo cache #7424

wants to merge 24 commits into from

Conversation

Globegitter
Copy link

@Globegitter Globegitter commented Feb 13, 2019

This picks up the work mentioned here #5928 (comment) and should fix #5116

I refactored things a bit and made sure that the git cache actually works offline as well as without a cache and added a simple test. I have tested it in our monorepo and it seems to be working well.

Not sure what the TODO needs a proper implementation. in the _hash method refers to maybe you now @aehlig or @siedentop ?

siedentop and others added 9 commits October 11, 2018 17:20
Inspired by the following pull request and done in collaboration
with @aehlig (Klaus Aehlig) at the Bazel Hackathon 2018.

#5928

Lots of open issues remaining:
* Unit tests needs to be adapted to the fact that there is now a cache.
* Klaus would like it to be optional only (I agree).
* 'git worktree add ...' does not work on second run. --> tests failing
because of that.
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@Globegitter
Copy link
Author

@siedentop could you please confirm the CLA - thanks :)

@siedentop
Copy link
Contributor

Hi @Globegitter and @googlebot , I have confirmed the CLA a few years back. This was under my old handle "unapiedra" but still the same GitHub account.

Can I do anything to confirm it again?

@siedentop
Copy link
Contributor

@Globegitter I've just seen your comment regarding the TODO in the _hash function. This has been a while back, but I am pretty sure that @aehlig and I implemented the hashing here through the shell as a shortcut, to quickly get some hash implementation. In my recollection I was of the opinion back then, that the maintainers would not approve the use of the shell for this.

An alternative to hashing would be simply to sanitize the remote URL to work as a file name. This has the benefit that it stays readable.

@jin
Copy link
Member

jin commented Feb 14, 2019

@siedentop I cannot find any Google CLAs signed by this current username. I believe you'll have to sign it again at https://cla.developers.google.com/ while logged in to the email used by this GitHub account, and for the commits.

@siedentop
Copy link
Contributor

@jin I had to update the username. Could you please check again?

@jin
Copy link
Member

jin commented Feb 14, 2019

Looks good!

@jin jin added cla: yes and removed cla: no labels Feb 14, 2019
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@jin jin added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Feb 14, 2019
@Globegitter
Copy link
Author

Globegitter commented Feb 14, 2019

I am seeing the tests are just failing on Ubuntu 14.04 - is there any way I can look at the logs @aehlig

@Globegitter
Copy link
Author

Great thanks for that pointer @siedentop

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

Googlers can find more info about SignCLA and this PR by following this link.

@Globegitter
Copy link
Author

The error on 14.04 is git: 'worktree' is not a git command. See 'git --help'. is there any way we can upgrade git on the the image @aehlig? I am not sure what the procedure is for a case like this.

tools/build_defs/repo/git.bzl Outdated Show resolved Hide resolved
tools/build_defs/repo/git.bzl Outdated Show resolved Hide resolved
tools/build_defs/repo/git.bzl Outdated Show resolved Hide resolved
@Globegitter
Copy link
Author

The test is now skipped if git worktree --help does not exist. So it is ready to review now. @aehlig any chance to get a review?

@aehlig
Copy link
Contributor

aehlig commented Apr 16, 2019

The test is now skipped if git worktree --help does not exist.

Is skipping the test entirely the right thing to do? Yes, we only expect caching, if the version of git supports it, but we still should ensure that we can fall back to a normal checkout if an older version of git is installed.

Copy link
Contributor

@aehlig aehlig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pull request and I'm sorry for the long delay. I generally like the idea of having a cache for git repositories. However, I think your patch does not yet provide what one usually expects of a cache; can you please address the comments.

Thanks,
Klaus


# Do not run tests if the git worktree command does not exist.
# Testing without if check - is windows timing out due to that?
# if git worktree --help; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this condition commented out? Should it be there? Or should we actually run the test even in this case, and just skip the verification that the correct hash ended up in the cache.

do_git_repository_test $commit_hash

cache_commit_hash="$(cat $cache_dir/worktrees/pluto/HEAD)"
assert_equals $commit_hash $cache_commit_hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A verification of caching usually verifies that we can do a clone from cache only in an "offline" situation, if we successfully filled the cache first, and attempt to check out a fixed hash (not a branch). I don't see your test verifying that main property of a cache.

_populate_cache(ctx, git_cache, remote_name, ref, shallow)
st = _get_repository_from_cache(ctx, directory, ref, git_cache)

if st.return_code:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail or fall back to using no cache?

st = _get_repository_from_cache(ctx, directory, ref, git_cache)

if st.return_code:
_populate_cache(ctx, git_cache, remote_name, ref, shallow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has the problem, that git fetch is called unconditionally, and git fetch tries to contact the remote site, even if we only need a commit already in cache. That means, we cannot use the cache when being offline, which would be a main use case for a cache.

This can be avoided by first trying to check out the correct commit and only if that fails, trying to update the cache.

@jin
Copy link
Member

jin commented Oct 2, 2019

Closing as activity on this seem to have stalled. PR authors: please address the review comments and rebase off master.

@jin jin closed this Oct 2, 2019
@jin jin reopened this Oct 10, 2019
@Globegitter
Copy link
Author

Globegitter commented Feb 11, 2020

I unfortunately will not have time in the foreseeable future to address the PR comments and bring this in a mergable state. So feel free to close for now. If I do hopefully get time again I am happy to reopen this as a new PR. But would be great if anyone else interested in this could pick this up.

@aiuto
Copy link
Contributor

aiuto commented Feb 14, 2020

Closing again for staleness.
PR authors: please address the review comments and rebase off master.

@aiuto aiuto closed this Feb 14, 2020
@philwo philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020
@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: no team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should git_repository use repository cache?
8 participants