Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pagination for S3 endpoint using continuation tokens #1395

Merged
merged 1 commit into from
Jan 11, 2017

Conversation

ssalinas
Copy link
Member

@ssalinas ssalinas commented Jan 9, 2017

This adds additional endpoints for searching s3 logs. This first attempt at pagination utilizes S3's continuation tokens. These tokens are specific to the bucket + prefix being search. So, this pagination would work via the api returning results as well as a list of tokens and their associated prefix/bucket that could be returned to retrieve a subsequent page.

I am not sold that this is the best option, but it is the one that limits the number of calls we make to s3 the best.

An additional option would be to list out all of the object details for everything, but only 'hydrate' (download/get url + extra metadata) those on the current page. This would allow simpler parameters (page + page size) and allow us a total count, but would require us listing all objects for all buckets + prefixes on each new page request.

thoughts @tpetr @darcatron ?

note - this pr is also built on top of the existing s3_rework branch

String bucket;
String prefix;
String value;
boolean lastPage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private final ?

this.start = start;
this.end = end;
this.excludeMetadata = excludeMetadata.or(false);
this.listOnly = listOnly.or(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're doing .or(false) it'd be cleaner to just have the type be boolean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot that the null will default to a false with jackson. Had it in my head that it would error. Updating 👍

private boolean isFinalPageForAllPrefixes(Set<ContinuationToken> continuationTokens) {
boolean finalPage = true;
for (ContinuationToken token : continuationTokens) {
finalPage = finalPage && token.isLastPage();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!token.isLastPage()) {
    return false
}

would be a little more efficient

this.excludeMetadata = excludeMetadata.or(false);
this.listOnly = listOnly.or(false);
this.maxPerPage = maxPerPage;
this.continuationTokens = Objects.firstNonNull(continuationTokens, Collections.<ContinuationToken>emptySet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of the continuation tokens but having to look up in the set seems a little janky to me -- could we map bucket + prefix to the token as a Map<String, ContinuationToken> ?

@ssalinas
Copy link
Member Author

ssalinas commented Jan 9, 2017

@tpetr got this a bit closer now. It's kind of a trade off of concurrency and fast calls vs respecting the perPage count. I could probably get it to exactly the right amount, but it would require making a new call with a different limit set for whatever the last bucket/prefix was to add data (so that we have a valid continuation token). Seemed silly to be dumping data we already fetched so I didn't do that yet.

Open to any comments :)

@ssalinas
Copy link
Member Author

@tpetr got this to a final(ish) state now.

The /search endpoint takes an object that can have a list of tasks/requests+deploys to search, and will find all needed prefixes, then begin getting pages of data. This way we can more easily search across multiple things without having to make an api call for each.

The page size is the -ish part of this still. In order to keep some concurrency/speed, I an utilizing AtomicInteger's compareAndSet. However, the last chunk of items to get added to the results may push the total count over the max page size. We can leave any out because the next continuation token is then not valid. We could re-request with a different maxKeys, but it seems silly to re-request data we already have.

need @consumes on POST endpoints

more s3 pagination tweaks

more attempts at better pagination

need >= here

fix maxPerPage

more robust search options

add missing file

fix for findbugs

continuation token format needs group too

use isTruncated for ending

re-request to respect page size

revert re-request for page size

missing tokens gives false positive end of content

typo
@ssalinas
Copy link
Member Author

With most recent changes this is working well in staging/qa. Going to merge it with the other s3 PR

@ssalinas ssalinas merged commit 8242609 into s3_rework Jan 11, 2017
@ssalinas ssalinas deleted the s3_pagination branch January 11, 2017 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants