Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer - AQL optimizations #1036

Merged
merged 2 commits into from
Nov 20, 2023
Merged

Transfer - AQL optimizations #1036

merged 2 commits into from
Nov 20, 2023

Conversation

yahavi
Copy link
Member

@yahavi yahavi commented Nov 19, 2023

  • All tests passed. If this feature is not already covered by the tests, I added new tests.
  • All static analysis checks passed.
  • This pull request is on the dev branch.
  • I used gofmt for formatting the code before submitting the pull request.

Phase 1 + 2: Disable DISTINCT
Adding .distinct(false) to the AQL queries. Given the existing filtering of uniqueness based on repository, path, and name, the possibility of encountering duplicate results is eliminated. My experimentation with a directory housing 2 million artifacts revealed significant differences:

  • Queries with distinct(false) consistently completed within 11 seconds.
  • Conversely, queries lacking this specification consistently surpassed 26 seconds.

Phase 2: Sort by name + path instead of by modifiedBy
There's a potential issue that could lead to unexpected results within a 15-minute timeframe. The problem lies in sorting by modifiedBy, which might not provide sufficient uniqueness. Here's an example scenario:

Between offsets 0 and 10,000:

  • File 1 modified at time 12
  • File 2 modified at time 133
  • File 10000 modified at time 144

Then, between offsets 10,001 and 20,000:

  • File 10001 shares the same modification time as File 10000 (both at time 144) <- This overlap could cause issues across ranges.
  • File 10002 modified at time 167

As files 10000 and 10001 possess identical modification times, we could encounter file 10,000 in both ranges while potentially missing file 10,001 altogether. The proposed solution involves sorting by name and path instead of modifiedBy. However, it's essential to note that this change might come with a performance penalty.

@yahavi yahavi added the improvement Automatically generated release notes label Nov 19, 2023
@yahavi yahavi requested a review from eyalbe4 November 19, 2023 15:14
@yahavi yahavi self-assigned this Nov 19, 2023
@yahavi yahavi changed the title Transfer files - AQL optimizations Transfer - AQL optimizations Nov 19, 2023
@yahavi yahavi merged commit 3bcedfc into jfrog:dev Nov 20, 2023
6 of 8 checks passed
@yahavi yahavi deleted the aql-optimizations branch November 20, 2023 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Automatically generated release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants