-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement pagination for S3 endpoint using continuation tokens #1395
Conversation
String bucket; | ||
String prefix; | ||
String value; | ||
boolean lastPage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private final
?
this.start = start; | ||
this.end = end; | ||
this.excludeMetadata = excludeMetadata.or(false); | ||
this.listOnly = listOnly.or(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're doing .or(false)
it'd be cleaner to just have the type be boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot that the null will default to a false with jackson. Had it in my head that it would error. Updating 👍
private boolean isFinalPageForAllPrefixes(Set<ContinuationToken> continuationTokens) { | ||
boolean finalPage = true; | ||
for (ContinuationToken token : continuationTokens) { | ||
finalPage = finalPage && token.isLastPage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!token.isLastPage()) {
return false
}
would be a little more efficient
this.excludeMetadata = excludeMetadata.or(false); | ||
this.listOnly = listOnly.or(false); | ||
this.maxPerPage = maxPerPage; | ||
this.continuationTokens = Objects.firstNonNull(continuationTokens, Collections.<ContinuationToken>emptySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the continuation tokens but having to look up in the set seems a little janky to me -- could we map bucket + prefix to the token as a Map<String, ContinuationToken>
?
a4b2c04
to
2c10cfb
Compare
@tpetr got this a bit closer now. It's kind of a trade off of concurrency and fast calls vs respecting the Open to any comments :) |
aba8344
to
12d8a81
Compare
@tpetr got this to a final(ish) state now. The The page size is the -ish part of this still. In order to keep some concurrency/speed, I an utilizing |
b7d4f2e
to
aa4a12f
Compare
need @consumes on POST endpoints more s3 pagination tweaks more attempts at better pagination need >= here fix maxPerPage more robust search options add missing file fix for findbugs continuation token format needs group too use isTruncated for ending re-request to respect page size revert re-request for page size missing tokens gives false positive end of content typo
aa4a12f
to
74e07a9
Compare
With most recent changes this is working well in staging/qa. Going to merge it with the other s3 PR |
This adds additional endpoints for searching s3 logs. This first attempt at pagination utilizes S3's continuation tokens. These tokens are specific to the bucket + prefix being search. So, this pagination would work via the api returning results as well as a list of tokens and their associated prefix/bucket that could be returned to retrieve a subsequent page.
I am not sold that this is the best option, but it is the one that limits the number of calls we make to s3 the best.
An additional option would be to list out all of the object details for everything, but only 'hydrate' (download/get url + extra metadata) those on the current page. This would allow simpler parameters (page + page size) and allow us a total count, but would require us listing all objects for all buckets + prefixes on each new page request.
thoughts @tpetr @darcatron ?
note - this pr is also built on top of the existing s3_rework branch