Fixed exception on repeated run of clone_repo_to #180

HelgeCPH · 2021-06-01T11:57:55Z

Hej, in this PR I modified the code so that repeated executions of the following form will work:

gh_repo = "https://github.com/ishepard/pydriller.git"
repo_path = "/tmp/pydriller"
for commit in Repository(path_to_repo=gh_repo, clone_repo_to=repo_path).traverse_commits():
    pass
for commit in Repository(path_to_repo=gh_repo, clone_repo_to=repo_path).traverse_commits():
    pass

In its current implementation, the above code works for the first instantiation of Repository, where the repository is correctly cloned into the given directory. However, the second instantiation of Repository panics with an exception similar to the following. In essence, it does not work since Pydriller tries to clone the repository again.

Traceback (most recent call last):
  File "/Home/scripts/collect_gh_repos.py", line 90, in <module>
    main(args.outpath)
  File "/Home/scripts/collect_gh_repos.py", line 40, in main
    commits_df = collect_commits(projs_to_check, outpath)
  File "/Home/scripts/collect_gh_repos.py", line 19, in collect_commits
    for commit in Repository(
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/pydriller/repository.py", line 207, in traverse_commits
    with self._prep_repo(path_repo=path_repo) as git:
  File "/usr/local/Cellar/python@3.9/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/pydriller/repository.py", line 170, in _prep_repo
    local_path_repo = self._clone_remote_repo(self._clone_folder(), path_repo)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/pydriller/repository.py", line 151, in _clone_remote_repo
    Repo.clone_from(url=repo, to_path=repo_folder)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/repo/base.py", line 1083, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/repo/base.py", line 1021, in _clone
    finalize_process(proc, stderr=stderr)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/util.py", line 369, in finalize_process
    proc.wait(**kwargs)
  File "/Home-Wp9NZyEL-py3.9/lib/python3.9/site-packages/git/cmd.py", line 450, in wait
    raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v https://github.com/apache/ant.git /tmp/pydriller
  stderr: 'fatal: destination path '/tmp/pydriller' already exists and is not an empty directory.

Since I did not expect such a behaviour and since I think it is practical when running repeated analysis on a repository, I implemented this fix with test.
To me, it is especially useful during exploratory analyzes of large repositories for which I do not want to repeatedly write code that checks if I already cloned a repo with another script.

codecov · 2021-06-01T11:59:12Z

Codecov Report

Merging #180 (63e1d78) into master (f613705) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #180   +/-   ##
=======================================
  Coverage   98.32%   98.32%           
=======================================
  Files          15       15           
  Lines        1015     1017    +2     
=======================================
+ Hits          998     1000    +2     
  Misses         17       17

Impacted Files	Coverage Δ
pydriller/repository.py	`100.00% <100.00%> (ø)`

ishepard

Thanks for the PR!

I was thinking, can you do this check in clone_remote_repo? In Line 149. What I had in mind is that when cloning the repo we check whether it already exists.

HelgeCPH · 2021-06-01T12:47:21Z

Yep, done. It makes more sense there and makes the code actually also shorter :) Thank you for pointing me to the correct place for the change.

ishepard

Looking great! Thanks for the work!
Just a nit comment, will accept anyway!

ishepard · 2021-06-01T12:49:06Z

pydriller/repository.py

+        repo_folder = os.path.join(tmp_folder, self._get_repo_name_from_url(repo))       
+        if os.path.isdir(repo_folder):
+            logger.info("Reusing folder %s for %s", repo_folder, repo)
+        else:
+            logger.info("Cloning %s in temporary folder %s", repo, repo_folder)
+            Repo.clone_from(url=repo, to_path=repo_folder)


Oh beautiful ❤️

ishepard · 2021-06-01T12:51:29Z

pydriller/repository.py

+            clone_folder = self._clone_folder()
+            local_path_repo = self._clone_remote_repo(clone_folder, path_repo)


Can you undo this change? Thanks!

Suggested change

clone_folder = self._clone_folder()

local_path_repo = self._clone_remote_repo(clone_folder, path_repo)

local_path_repo = self._clone_remote_repo(self._clone_folder(), path_repo)

HelgeCPH · 2021-06-01T12:58:13Z

Yes, I reverted the creation of the clone_folder variable. I did it only since I follow the 80 character rule. Eventually I have to check Pydriller's linter rules before committing :)

ishepard · 2021-06-01T16:15:10Z

Awesome thank you for the PR!

ishepard · 2021-06-01T16:17:07Z

HelgeCPH added 3 commits June 1, 2021 13:36

Fixed exception on repeated run of clone_to

33c1e82

Trying to satisfy the linter

4b6e514

Trying to satisfy the linter a bit more

57bace5

ishepard requested changes Jun 1, 2021

View reviewed changes

Moved check of already checked-out repo to appropriate method

dc32600

Trying to satisfy the linter again

0f8e5dc

ishepard reviewed Jun 1, 2021

View reviewed changes

Incorporated feedback on variable use

63e1d78

ishepard approved these changes Jun 1, 2021

View reviewed changes

ishepard merged commit 94532fc into ishepard:master Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed exception on repeated run of clone_repo_to #180

Fixed exception on repeated run of clone_repo_to #180

HelgeCPH commented Jun 1, 2021 •

edited

Loading

codecov bot commented Jun 1, 2021 •

edited

Loading

ishepard left a comment

HelgeCPH commented Jun 1, 2021

ishepard left a comment

ishepard Jun 1, 2021

ishepard Jun 1, 2021

HelgeCPH commented Jun 1, 2021

ishepard commented Jun 1, 2021 •

edited

Loading

ishepard commented Jun 1, 2021

		clone_folder = self._clone_folder()
		local_path_repo = self._clone_remote_repo(clone_folder, path_repo)

Fixed exception on repeated run of clone_repo_to #180

Fixed exception on repeated run of clone_repo_to #180

Conversation

HelgeCPH commented Jun 1, 2021 • edited Loading

codecov bot commented Jun 1, 2021 • edited Loading

Codecov Report

ishepard left a comment

Choose a reason for hiding this comment

HelgeCPH commented Jun 1, 2021

ishepard left a comment

Choose a reason for hiding this comment

ishepard Jun 1, 2021

Choose a reason for hiding this comment

ishepard Jun 1, 2021

Choose a reason for hiding this comment

HelgeCPH commented Jun 1, 2021

ishepard commented Jun 1, 2021 • edited Loading

ishepard commented Jun 1, 2021

HelgeCPH commented Jun 1, 2021 •

edited

Loading

codecov bot commented Jun 1, 2021 •

edited

Loading

ishepard commented Jun 1, 2021 •

edited

Loading