Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support/datastore: Make resumability robust to unexpected overlaps in adjacent ranges #5326

Merged
merged 3 commits into from
May 30, 2024

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented May 30, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Adjacent ranges may end up overlapping due to the clamping behavior in adjustLedgerRange()

For example, assuming 64 ledgers per file, [2, 100] and [101, 150] get adjusted to [2, 127] and [64, 191]

If we export [64, 191] and then try to resume on [2, 127], the binary search logic will determine that [2, 127] is fully exported because:

  1. the midpoint [2, 127] is 64
  2. ledger 64 is present on the data store given that we already exported [64, 191]

This behavior will therefore return an incorrect result if ledgers [2, 63] are missing from the data store.

We can fix this issue by querying the end ledger and, if it is present, only do the binary search on all the preceding files. So, in the example above, when exporting [2, 127], we will:

  1. check if ledger 127 is present on the data store
  2. since the ledger is present we will only do the binary search on the range of [2, 63] instead of [2, 127]
  3. If the ledger file corresponding to [2, 63] is missing, resumability will correctly determine we must start from ledger 2

Note that if there is an overlap in adjacent ranges caused by adjustLedgerRange(), the size of the overlap will never be larger than the number of files per partition and that is why it is sufficient to only check if the end ledger is present.

Known limitations

[N/A]

@tamirms tamirms requested a review from a team May 30, 2024 12:18
@tamirms tamirms merged commit 083b7bb into stellar:master May 30, 2024
31 checks passed
@tamirms tamirms deleted the fix-resumability-adjacent branch May 30, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants