Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed exception on repeated run of clone_repo_to #180

Merged
merged 6 commits into from
Jun 1, 2021

Conversation

HelgeCPH
Copy link
Contributor

@HelgeCPH HelgeCPH commented Jun 1, 2021

Hej, in this PR I modified the code so that repeated executions of the following form will work:

gh_repo = "https://github.com/ishepard/pydriller.git"
repo_path = "/tmp/pydriller"
for commit in Repository(path_to_repo=gh_repo, clone_repo_to=repo_path).traverse_commits():
    pass
for commit in Repository(path_to_repo=gh_repo, clone_repo_to=repo_path).traverse_commits():
    pass

In its current implementation, the above code works for the first instantiation of Repository, where the repository is correctly cloned into the given directory. However, the second instantiation of Repository panics with an exception similar to the following. In essence, it does not work since Pydriller tries to clone the repository again.

Traceback (most recent call last):
  File "/Home/scripts/collect_gh_repos.py", line 90, in <module>
    main(args.outpath)
  File "/Home/scripts/collect_gh_repos.py", line 40, in main
    commits_df = collect_commits(projs_to_check, outpath)
  File "/Home/scripts/collect_gh_repos.py", line 19, in collect_commits
    for commit in Repository(
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/pydriller/repository.py", line 207, in traverse_commits
    with self._prep_repo(path_repo=path_repo) as git:
  File "/usr/local/Cellar/python@3.9/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/pydriller/repository.py", line 170, in _prep_repo
    local_path_repo = self._clone_remote_repo(self._clone_folder(), path_repo)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/pydriller/repository.py", line 151, in _clone_remote_repo
    Repo.clone_from(url=repo, to_path=repo_folder)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/repo/base.py", line 1083, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/repo/base.py", line 1021, in _clone
    finalize_process(proc, stderr=stderr)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/util.py", line 369, in finalize_process
    proc.wait(**kwargs)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/cmd.py", line 450, in wait
    raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v https://github.com/apache/ant.git /tmp/pydriller
  stderr: 'fatal: destination path '/tmp/pydriller' already exists and is not an empty directory.

Since I did not expect such a behaviour and since I think it is practical when running repeated analysis on a repository, I implemented this fix with test.
To me, it is especially useful during exploratory analyzes of large repositories for which I do not want to repeatedly write code that checks if I already cloned a repo with another script.

@codecov
Copy link

codecov bot commented Jun 1, 2021

Codecov Report

Merging #180 (63e1d78) into master (f613705) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #180   +/-   ##
=======================================
  Coverage   98.32%   98.32%           
=======================================
  Files          15       15           
  Lines        1015     1017    +2     
=======================================
+ Hits          998     1000    +2     
  Misses         17       17           
Impacted Files Coverage Δ
pydriller/repository.py 100.00% <100.00%> (ø)

Copy link
Owner

@ishepard ishepard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I was thinking, can you do this check in clone_remote_repo? In Line 149. What I had in mind is that when cloning the repo we check whether it already exists.

@HelgeCPH
Copy link
Contributor Author

HelgeCPH commented Jun 1, 2021

Yep, done. It makes more sense there and makes the code actually also shorter :) Thank you for pointing me to the correct place for the change.

Copy link
Owner

@ishepard ishepard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thanks for the work!
Just a nit comment, will accept anyway!

Comment on lines 149 to 154
repo_folder = os.path.join(tmp_folder, self._get_repo_name_from_url(repo))
if os.path.isdir(repo_folder):
logger.info("Reusing folder %s for %s", repo_folder, repo)
else:
logger.info("Cloning %s in temporary folder %s", repo, repo_folder)
Repo.clone_from(url=repo, to_path=repo_folder)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh beautiful ❤️

Comment on lines 173 to 174
clone_folder = self._clone_folder()
local_path_repo = self._clone_remote_repo(clone_folder, path_repo)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this change? Thanks!

Suggested change
clone_folder = self._clone_folder()
local_path_repo = self._clone_remote_repo(clone_folder, path_repo)
local_path_repo = self._clone_remote_repo(self._clone_folder(), path_repo)

@HelgeCPH
Copy link
Contributor Author

HelgeCPH commented Jun 1, 2021

Yes, I reverted the creation of the clone_folder variable. I did it only since I follow the 80 character rule. Eventually I have to check Pydriller's linter rules before committing :)

@ishepard
Copy link
Owner

ishepard commented Jun 1, 2021

Awesome thank you for the PR!

@ishepard ishepard merged commit 94532fc into ishepard:master Jun 1, 2021
@ishepard
Copy link
Owner

ishepard commented Jun 1, 2021

Yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants