Implement pagination for S3 endpoint using continuation tokens #1395

ssalinas · 2017-01-09T17:25:16Z

This adds additional endpoints for searching s3 logs. This first attempt at pagination utilizes S3's continuation tokens. These tokens are specific to the bucket + prefix being search. So, this pagination would work via the api returning results as well as a list of tokens and their associated prefix/bucket that could be returned to retrieve a subsequent page.

I am not sold that this is the best option, but it is the one that limits the number of calls we make to s3 the best.

An additional option would be to list out all of the object details for everything, but only 'hydrate' (download/get url + extra metadata) those on the current page. This would allow simpler parameters (page + page size) and allow us a total count, but would require us listing all objects for all buckets + prefixes on each new page request.

thoughts @tpetr @darcatron ?

note - this pr is also built on top of the existing s3_rework branch

tpetr · 2017-01-09T18:25:04Z

SingularityBase/src/main/java/com/hubspot/singularity/api/ContinuationToken.java

+  String bucket;
+  String prefix;
+  String value;
+  boolean lastPage;


private final ?

tpetr · 2017-01-09T18:25:58Z

SingularityBase/src/main/java/com/hubspot/singularity/api/SingularityS3SearchRequest.java

+    this.start = start;
+    this.end = end;
+    this.excludeMetadata = excludeMetadata.or(false);
+    this.listOnly = listOnly.or(false);


if you're doing .or(false) it'd be cleaner to just have the type be boolean

Forgot that the null will default to a false with jackson. Had it in my head that it would error. Updating 👍

tpetr · 2017-01-09T18:29:54Z

SingularityService/src/main/java/com/hubspot/singularity/resources/S3LogResource.java

+  private boolean isFinalPageForAllPrefixes(Set<ContinuationToken> continuationTokens) {
+    boolean finalPage = true;
+    for (ContinuationToken token : continuationTokens) {
+      finalPage = finalPage && token.isLastPage();


if (!token.isLastPage()) { return false }

would be a little more efficient

tpetr · 2017-01-09T18:31:08Z

SingularityBase/src/main/java/com/hubspot/singularity/api/SingularityS3SearchRequest.java

+    this.excludeMetadata = excludeMetadata.or(false);
+    this.listOnly = listOnly.or(false);
+    this.maxPerPage = maxPerPage;
+    this.continuationTokens = Objects.firstNonNull(continuationTokens, Collections.<ContinuationToken>emptySet());


I like the idea of the continuation tokens but having to look up in the set seems a little janky to me -- could we map bucket + prefix to the token as a Map<String, ContinuationToken> ?

ssalinas · 2017-01-09T22:08:23Z

@tpetr got this a bit closer now. It's kind of a trade off of concurrency and fast calls vs respecting the perPage count. I could probably get it to exactly the right amount, but it would require making a new call with a different limit set for whatever the last bucket/prefix was to add data (so that we have a valid continuation token). Seemed silly to be dumping data we already fetched so I didn't do that yet.

Open to any comments :)

ssalinas · 2017-01-10T18:44:46Z

@tpetr got this to a final(ish) state now.

The /search endpoint takes an object that can have a list of tasks/requests+deploys to search, and will find all needed prefixes, then begin getting pages of data. This way we can more easily search across multiple things without having to make an api call for each.

The page size is the -ish part of this still. In order to keep some concurrency/speed, I an utilizing AtomicInteger's compareAndSet. However, the last chunk of items to get added to the results may push the total count over the max page size. We can leave any out because the next continuation token is then not valid. We could re-request with a different maxKeys, but it seems silly to re-request data we already have.

@consumes

need @consumes on POST endpoints more s3 pagination tweaks more attempts at better pagination need >= here fix maxPerPage more robust search options add missing file fix for findbugs continuation token format needs group too use isTruncated for ending re-request to respect page size revert re-request for page size missing tokens gives false positive end of content typo

ssalinas · 2017-01-11T15:36:36Z

With most recent changes this is working well in staging/qa. Going to merge it with the other s3 PR

ssalinas added the hs_staging label Jan 9, 2017

tpetr reviewed Jan 9, 2017

View reviewed changes

ssalinas force-pushed the s3_pagination branch from a4b2c04 to 2c10cfb Compare January 9, 2017 21:47

ssalinas mentioned this pull request Jan 10, 2017

[logfetch] Use the new paginated s3 search endpoint #1396

Closed

ssalinas force-pushed the s3_pagination branch from aba8344 to 12d8a81 Compare January 10, 2017 18:10

ssalinas added the hs_qa label Jan 10, 2017

ssalinas force-pushed the s3_pagination branch from b7d4f2e to aa4a12f Compare January 11, 2017 14:18

ssalinas force-pushed the s3_pagination branch from aa4a12f to 74e07a9 Compare January 11, 2017 15:36

ssalinas merged commit 8242609 into s3_rework Jan 11, 2017

ssalinas deleted the s3_pagination branch January 11, 2017 15:36

ssalinas removed hs_qa labels Jan 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement pagination for S3 endpoint using continuation tokens #1395

Implement pagination for S3 endpoint using continuation tokens #1395

ssalinas commented Jan 9, 2017

tpetr Jan 9, 2017

tpetr Jan 9, 2017

ssalinas Jan 9, 2017

tpetr Jan 9, 2017

tpetr Jan 9, 2017

ssalinas commented Jan 9, 2017

ssalinas commented Jan 10, 2017

ssalinas commented Jan 11, 2017

Implement pagination for S3 endpoint using continuation tokens #1395

Implement pagination for S3 endpoint using continuation tokens #1395

Conversation

ssalinas commented Jan 9, 2017

tpetr Jan 9, 2017

Choose a reason for hiding this comment

tpetr Jan 9, 2017

Choose a reason for hiding this comment

ssalinas Jan 9, 2017

Choose a reason for hiding this comment

tpetr Jan 9, 2017

Choose a reason for hiding this comment

tpetr Jan 9, 2017

Choose a reason for hiding this comment

ssalinas commented Jan 9, 2017

ssalinas commented Jan 10, 2017

ssalinas commented Jan 11, 2017