Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop deleting log entries in log_create #559

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

aleh-rymasheuski
Copy link
Contributor

@aleh-rymasheuski aleh-rymasheuski commented Aug 31, 2023

This pull request addresses two issues:

  1. Even if a pk value is reused, we should not delete log entries for old deleted object. Old object's lifetime is easily identifiable by a pair of matching created & deleted log entries.
  2. (why I started to look into this in the first place) log_create spends 22% of its run time calling delete() method in a test run of a medium-sized data loading job on my machine (7% of the total run time of the job).

I assume this is technically a breaking change so I'd like it included in 3.0.0 release too.

Notice the change in the log_update handler. Before my changes, every object creation was producing two log entries: updated log entry, then created log entry; and while creating this second log entry, auditlog was deleting the first one, so it was not noticeable.

@aleh-rymasheuski aleh-rymasheuski added this to the 3.0 milestone Aug 31, 2023
@codecov
Copy link

codecov bot commented Aug 31, 2023

Codecov Report

Merging #559 (9a45bc5) into master (858034b) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #559      +/-   ##
==========================================
- Coverage   94.96%   94.94%   -0.03%     
==========================================
  Files          31       31              
  Lines         994      990       -4     
==========================================
- Hits          944      940       -4     
  Misses         50       50              
Files Changed Coverage Δ
auditlog/models.py 90.22% <ø> (-0.15%) ⬇️
auditlog/receivers.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -52,6 +52,9 @@ def log_update(sender, instance, **kwargs):
if instance.pk is not None:
update_fields = kwargs.get("update_fields", None)
old = sender.objects.filter(pk=instance.pk).first()
if not old:
Copy link
Contributor Author

@aleh-rymasheuski aleh-rymasheuski Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hramezani, should I replace the check on line 52 (if instance.pk is not None) to if instance._state.adding? https://docs.djangoproject.com/en/4.2/ref/models/instances/#django.db.models.Model._state
It will spare us an extra query to the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this for a short while and decided to use the adding flag (it exists for a reason).

@hramezani
Copy link
Member

Thanks @aleh-rymasheuski

@aqeelat it would be great to take a look at this PR if you have time.

@hramezani hramezani requested a review from aqeelat September 1, 2023 17:18
if instance.pk is not None:
if not instance._state.adding:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Comment on lines -69 to -88
# Delete log entries with the same pk as a newly created model.
# This should only be necessary when a pk is used twice.
if kwargs.get("action", None) is LogEntry.Action.CREATE:
if (
kwargs.get("object_id", None) is not None
and self.filter(
content_type=kwargs.get("content_type"),
object_id=kwargs.get("object_id"),
).exists()
):
self.filter(
content_type=kwargs.get("content_type"),
object_id=kwargs.get("object_id"),
).delete()
else:
self.filter(
content_type=kwargs.get("content_type"),
object_pk=kwargs.get("object_pk", ""),
).delete()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added in the first place?
https://thoughtbot.com/blog/chestertons-fence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aqeelat, I tried to find the reason for adding it and couldn't find any solid reason. The first revision of this block of code was added in a commit 81098cd labeled "Big commit: testproject, bugfixes, changed directory structure".

One possible reason is that the bug I fixed in receivers.py in this pull request was adding an invalid "updated" log entry before logging the "created" log entry.
Another possible reason is that @jjkester was using auditlog with MySQL which was known to reuse the values of primary keys at least at the time when auditlog was first implemented.

The comment in the removed code says that primary key reuse "should only happen when all records were deleted / the table was truncated" which is the third possible reason: @jjkester had to truncate the table of an auditlogged model at least once (which shouldn't be viewed as a common use case therefore it shouldn't be the behavior by default).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary key reuse is still a thing in mysql (link). Should we handle this by checking the db engine? Or maybe by a configuration var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aqeelat, I don't think we should ever delete auditlog entries by default. The audit log exists to be immutable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aleh-rymasheuski I agree. However, people who are serious about auditlogs will also be careful to avoid primary-key reuse.
The main issue is that when accessing the logs for a instance, we shouldn't get entries for older instances with the same object id.
I know this is an edge case, which is why I'm trying to find the most inclusive solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hramezani @aleh-rymasheuski
I don't want to stand in the way here. I trust you're judgment. I just wanted to give my 2 cents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aqeelat for your input here.

IMHO, we can merge it.

@hramezani hramezani merged commit ac737fd into jazzband:master Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants