Skip to content

Commit

Permalink
Fix and improve migration 0025
Browse files Browse the repository at this point in the history
[noissue]

These changes SHOULD have been part of commit:
c2de7cc

The changes do 3 separate things:

* Fix a critical bug where repo_content_to_update batches were not
  cleared after bulk_update!
* Added some extra logging for each batch of PRC that is successfully
  updated. This helps distinguish between cases where the migration is
  taking a long time because there are a lot of PRC to update, and cases
  where the migration is simply stuck.
* Added an order_by on one querry to greatly improve the real world
  efficiency of the migration. The result is that we always keep the
  oldest of a set of colliding ReleaseComponents, since these generally
  have the most packages associated with them, resulting in fewer
  associations that need to be updated. In case of Debian main
  components this can mean processing only 10s of packages instead of
  around 50k!
  • Loading branch information
quba42 committed Jul 13, 2023
1 parent 66526ba commit cdcf8a2
Showing 1 changed file with 4 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,9 @@ def _deduplicate_PRC(duplicate_component, component_to_keep):

if len(repo_content_to_update) >= BATCH_SIZE:
RepositoryContent.objects.bulk_update(repo_content_to_update, ["content_id"])
repo_content_to_update = []
message = '{}: Merged PRC batch from duplicate component "{}" into component "{}"!'
log.info(message.format(datetime.now(), duplicate_component, component_to_keep))

# Handle remaining content <= BATCH_SIZE:
if len(repo_content_to_update) > 0:
Expand Down Expand Up @@ -210,7 +213,7 @@ def _deduplicate_PRC(duplicate_component, component_to_keep):
duplicate_component_ids = list(
ReleaseComponent.objects.filter(
distribution=distribution, component=component
).values_list('pk', flat=True)
).order_by('-pulp_created').values_list('pk', flat=True)
)
if len(duplicate_component_ids) > 1:
component_to_keep = duplicate_component_ids.pop()
Expand Down

0 comments on commit cdcf8a2

Please sign in to comment.