Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT(CodeChurn): Add method to return lines added and removed per file #299

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions docs/processmetrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,12 @@ Depending on the parametrization, a code churn is the sum of either

across the analyzed commits.

The class ``CodeChurn`` has three methods:
The class ``CodeChurn`` has four methods:

* ``count()`` to count the *total* size of code churns of a file;
* ``max()`` to count the *maximum* size of a code churn of a file;
* ``avg()`` to count the *average* size of a code churn of a file. **Note:** The average value is rounded off to the nearest integer.
* ``avg()`` to count the *average* size of a code churn of a file. **Note:** The average value is rounded off to the nearest integer;
* ``get_added_and_removed_lines()`` to retrieve the *exact* number of lines added and removed for each file as a tuple (added_lines, removed_lines).

For example::

Expand All @@ -82,16 +83,21 @@ For example::
files_count = metric.count()
files_max = metric.max()
files_avg = metric.avg()
added_removed_lines = metric.get_added_and_removed_lines()

print('Total code churn for each file: {}'.format(files_count))
print('Maximum code churn for each file: {}'.format(files_max))
print('Average code churn for each file: {}'.format(files_avg))
print('Lines added and removed for each file: {}'.format(added_removed_lines))

will print the total, maximum and average number of code churn for each modified file in the evolution period ``[from_commit, to_commit]``.
will print the total, maximum, and average number of code churns for each modified file, along with the number of lines added and removed, in the evolution period ``[from_commit, to_commit]``.

The calculation variant (a) or (b) can be configured by setting the ``CodeChurn`` init parameter:

* ``add_deleted_lines_to_churn``

To retrieve the added and removed lines for each file directly, the ``get_added_and_removed_lines()`` method can be used, which returns a dictionary with file paths as keys and a tuple (added_lines, removed_lines) as values.


Commits Count
=============
Expand Down
20 changes: 16 additions & 4 deletions pydriller/metrics/process/code_churn.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Module that calculates the number of hunks made to a commit file.
"""
import statistics
from typing import Optional
from typing import Optional, Dict, Tuple

from pydriller import ModificationType
from pydriller.metrics.process.process_metric import ProcessMetric
Expand Down Expand Up @@ -35,10 +35,10 @@ def __init__(self, path_to_repo: str,
super().__init__(path_to_repo, since=since, to=to, from_commit=from_commit, to_commit=to_commit)
self.ignore_added_files = ignore_added_files
self.add_deleted_lines_to_churn = add_deleted_lines_to_churn
self.added_removed_lines: Dict[str, Tuple[int, int]] = {}
self._initialize()

def _initialize(self):

renamed_files = {}
self.files = {}

Expand All @@ -54,13 +54,25 @@ def _initialize(self):
if self.ignore_added_files and modified_file.change_type == ModificationType.ADD:
continue

added_lines = modified_file.added_lines
deleted_lines = modified_file.deleted_lines
self.added_removed_lines[filepath] = (added_lines, deleted_lines)

if self.add_deleted_lines_to_churn:
churn = modified_file.added_lines + modified_file.deleted_lines
churn = added_lines + deleted_lines
else:
churn = modified_file.added_lines - modified_file.deleted_lines
churn = added_lines - deleted_lines

self.files.setdefault(filepath, []).append(churn)

def get_added_and_removed_lines(self) -> Dict[str, Tuple[int, int]]:
"""
Returns a dictionary with file paths as keys and a tuple of added and removed lines as values.

:return: A dictionary where the key is the file path, and the value is a tuple (added_lines, removed_lines).
"""
return self.added_removed_lines

def count(self):
"""
Return the total number of code churns for each modified file.
Expand Down
11 changes: 11 additions & 0 deletions tests/metrics/process/test_code_churn.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,14 @@ def test_with_add_deleted_lines_flag():
assert len(code_churns) == 18
assert str(Path('domain/__init__.py')) in code_churns
assert code_churns[str(Path('domain/commit.py'))] == 40


def test_get_added_and_removed_lines():
metric = CodeChurn(path_to_repo='test-repos/pydriller',
from_commit='ab36bf45859a210b0eae14e17683f31d19eea041',
to_commit='fdf671856b260aca058e6595a96a7a0fba05454b')

added_removed_lines = metric.get_added_and_removed_lines()

assert isinstance(added_removed_lines, dict)
assert all(isinstance(value, tuple) and len(value) == 2 for value in added_removed_lines.values())
Loading