Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking empty files #248

Open
vincenzorrei opened this issue Dec 16, 2022 · 6 comments
Open

Tracking empty files #248

vincenzorrei opened this issue Dec 16, 2022 · 6 comments
Labels
bug Something isn't working PR welcome Issue is confirmed, but not fixed yet

Comments

@vincenzorrei
Copy link

vincenzorrei commented Dec 16, 2022

Tracking the history of a file, I saw that the class ModifiedFile for empty files can not retrive correctly the info about old_path for them creation (it should be None but we have the same path of new_path) and new_path for them deletion (should be None but it remains the same).
It happens often dealing with "_init_.py"
I think the problem (if it's a problem) could be in parsing the git diff when there are no modified lines.

Here is what happens by parsing your repo for the creation of 'pydriller/tests/integration/_init_.py' :

  • commit: 4be0402
  • ModifiedFile:
    -- old_path = {str} 'pydriller/tests/integration/_init_.py'
    -- source_code_before = {NoneType} None
    -- new_path = {str} 'pydriller/tests/integration/_init_.py'
    -- source_code = {str} ''

with command:
git diff 4be0402~ 4be0402

...
diff --git a/pydriller/tests/integration/_init_.py b/pydriller/tests/integration/_init_.py
new file mode 100644
index 0000000..e69de29
...

Am I missing something?
Thanks for your time.

@ishepard
Copy link
Owner

Hi!
In the commit you showed, the file in question is renamed, hence there are both the old path and the new path.
A small test:

for commit in Repository('.', single="4be0402d466470ae7274c4244bad2712dfeda3ab").traverse_commits():
    for mod in commit.modified_files:
        print(mod.filename)
        print(mod.old_path)
        print(mod.new_path)
        print(mod.change_type)
        print("-------------------------------")

Result:

__init__.py
domain/__init__.py
pydriller/__init__.py
ModificationType.RENAME
-------------------------------
__init__.py
scm/__init__.py
pydriller/domain/__init__.py
ModificationType.RENAME
-------------------------------
change_set.py
domain/change_set.py
pydriller/domain/change_set.py
ModificationType.RENAME
-------------------------------
commit.py
domain/commit.py
pydriller/domain/commit.py
ModificationType.RENAME
-------------------------------
developer.py
domain/developer.py
pydriller/domain/developer.py
ModificationType.RENAME
-------------------------------
diff_block.py
domain/diff_block.py
pydriller/domain/diff_block.py
ModificationType.RENAME
-------------------------------
modification.py
domain/modification.py
pydriller/domain/modification.py
ModificationType.RENAME
-------------------------------
modification_type.py
domain/modification_type.py
pydriller/domain/modification_type.py
ModificationType.RENAME
-------------------------------
repository_mining.py
repository_mining.py
pydriller/repository_mining.py
ModificationType.RENAME
-------------------------------
__init__.py
tests/__init__.py
pydriller/scm/__init__.py
ModificationType.RENAME
-------------------------------
blamed_line.py
scm/blamed_line.py
pydriller/scm/blamed_line.py
ModificationType.RENAME
-------------------------------
commit_visitor.py
None
pydriller/scm/commit_visitor.py
ModificationType.ADD
-------------------------------
git_repository.py
scm/git_repository.py
pydriller/scm/git_repository.py
ModificationType.RENAME
-------------------------------
persistence_mechanism.py
scm/persistence_mechanism.py
pydriller/scm/persistence_mechanism.py
ModificationType.RENAME
-------------------------------
__init__.py
tests/integration/__init__.py
pydriller/tests/__init__.py
ModificationType.RENAME
-------------------------------
__init__.py
pydriller/tests/integration/__init__.py
pydriller/tests/integration/__init__.py
ModificationType.ADD
-------------------------------
concurrency_visitor_test.py
tests/integration/concurrency_visitor_test.py
pydriller/tests/integration/concurrency_visitor_test.py
ModificationType.RENAME
-------------------------------
test_between_dates.py
tests/integration/test_between_dates.py
pydriller/tests/integration/test_between_dates.py
ModificationType.RENAME
-------------------------------
test_between_tags.py
tests/integration/test_between_tags.py
pydriller/tests/integration/test_between_tags.py
ModificationType.RENAME
-------------------------------
test_commit_filters.py
tests/integration/test_commit_filters.py
pydriller/tests/integration/test_commit_filters.py
ModificationType.RENAME
-------------------------------
test_concurrency.py
tests/integration/test_concurrency.py
pydriller/tests/integration/test_concurrency.py
ModificationType.RENAME
-------------------------------
test_dates_and_timezones.py
tests/integration/test_dates_and_timezones.py
pydriller/tests/integration/test_dates_and_timezones.py
ModificationType.RENAME
-------------------------------
test_only_modification_with_file_type.py
tests/integration/test_only_modification_with_file_type.py
pydriller/tests/integration/test_only_modification_with_file_type.py
ModificationType.RENAME
-------------------------------
test_reverse_order.py
tests/integration/test_reverse_order.py
pydriller/tests/integration/test_reverse_order.py
ModificationType.RENAME
-------------------------------
test_commit.py
tests/test_commit.py
pydriller/tests/test_commit.py
ModificationType.RENAME
-------------------------------
test_git_repository.py
tests/test_git_repository.py
pydriller/tests/test_git_repository.py
ModificationType.RENAME
-------------------------------
test_memory_consumption.py
tests/test_memory_consumption.py
pydriller/tests/test_memory_consumption.py
ModificationType.RENAME
-------------------------------
test_modification.py
tests/test_modification.py
pydriller/tests/test_modification.py
ModificationType.RENAME
-------------------------------
test_ranges.py
tests/test_ranges.py
pydriller/tests/test_ranges.py
ModificationType.RENAME
-------------------------------
visitor_test.py
None
pydriller/tests/visitor_test.py
ModificationType.ADD
-------------------------------
commit_visitor.py
scm/commit_visitor.py
None
ModificationType.DELETE
-------------------------------
setup.py
None
setup.py
ModificationType.ADD
-------------------------------
visitor_test.py
tests/visitor_test.py
None
ModificationType.DELETE
-------------------------------

And indeed, if we run git show 4be0402d466470ae7274c4244bad2712dfeda3ab, the result is:

diff --git a/domain/__init__.py b/pydriller/__init__.py
similarity index 100%
rename from domain/__init__.py
rename to pydriller/__init__.py

@vincenzorrei
Copy link
Author

vincenzorrei commented Dec 16, 2022

Excuse me but I didn't get it.

Why do i see this?

__init__.py
pydriller/tests/integration/__init__.py               <-- shouldn't it be None?
pydriller/tests/integration/__init__.py
ModificationType.ADD
-------------------------------

And this?

visitor_test.py
None                                                        <-- as here
pydriller/tests/visitor_test.py
ModificationType.ADD
-------------------------------

@ishepard
Copy link
Owner

Oh thanks, now I saw it. Indeed, in this specific case, looks like GitPython returns the wrong information:

%s
==========
lhs: None
rhs: None
file renamed from 'tests/integration/__init__.py'
file renamed to 'pydriller/tests/__init__.py'
pydriller/tests/integration/__init__.py
=======================================================

For some reason GitPython thinks the file is a rename of a file that doesn't even exists. I will need to open an issue on GitPython side.

@ishepard
Copy link
Owner

I was able to reproduce locally:

cd /tmp
mkdir test && cd test
git init
touch asd.txt
git add asd.txt
git commit -m "add empty file"

Then run:

from git import Repo, NULL_TREE

repo = Repo("/tmp/test")

commit = repo.head.commit

diffs = commit.diff(NULL_TREE, paths=None, create_patch=True)

for diff in diffs:
    print(diff.a_path)
    print(diff.b_path)

Both paths will be the same.

@ishepard
Copy link
Owner

ishepard commented Dec 16, 2022

Indeed I remembered this kind of problem was raised before. Turns out that 4 years ago I opened a PR about this: gitpython-developers/GitPython#749

I never managed to work on it.

@vincenzorrei
Copy link
Author

Ok. Thank you so much for your time!

@ishepard ishepard added bug Something isn't working PR welcome Issue is confirmed, but not fixed yet labels Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PR welcome Issue is confirmed, but not fixed yet
Projects
None yet
Development

No branches or pull requests

2 participants