Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Phase 1 + 2: Disable DISTINCT
Adding
.distinct(false)
to the AQL queries. Given the existing filtering of uniqueness based on repository, path, and name, the possibility of encountering duplicate results is eliminated. My experimentation with a directory housing 2 million artifacts revealed significant differences:distinct(false)
consistently completed within 11 seconds.Phase 2: Sort by name + path instead of by
modifiedBy
There's a potential issue that could lead to unexpected results within a 15-minute timeframe. The problem lies in sorting by
modifiedBy
, which might not provide sufficient uniqueness. Here's an example scenario:Between offsets 0 and 10,000:
Then, between offsets 10,001 and 20,000:
As files 10000 and 10001 possess identical modification times, we could encounter file 10,000 in both ranges while potentially missing file 10,001 altogether. The proposed solution involves sorting by
name
andpath
instead ofmodifiedBy
. However, it's essential to note that this change might come with a performance penalty.