-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a mechanism to save bucket pagination ranges #5742
Comments
@coryan Hello! I would like to take this as first issue to get involved. |
@Ayrtat nobody is working on this at the moment. If you are interested, I recommend you make sure you can run unit tests and integration tests locally: We can then discuss how the |
@coryan
|
Very cool. I guess at this point we need to start figuring out how these functions would look like. Let me start by saying that my proposal is going to seem very complicated. The additional convolutions are to keep the public API as narrow as possible. Every function and type in the public API is a thing we need to support more-or-less forever, so we try to create as few as possible, sometimes jumping through some hoops to make it so. Also, if you can think of a better design, please do speak up! I have no monopoly on good ideas. Just keep in mind the costs of public APIs. I imagine we would start with something like: #include "google/cloud/storage/internal/complex_options.h"
namespace google::cloud::storage { // C++17 for exposition-only
class PaginationToken;
namespace internal {
std::string GetTokenValue(PaginationToken const&);
PaginationToken MakePaginationToken(std::string value);
}
class PaginationToken {
public:
// This class should be opaque, application developers should not be able to create them, or access the internal state.
// the only way to get one is calling
private:
friend GetTokenValue;
friend MakePaginationToken;
explicit PaginationToken(std::string value)
std::string value_;
};
struct PaginationState : public ComplexOption<PaginationState, std::string> {
// look at any of the other complex options for details
} ;
} Then we would need to add this as a valid option to google-cloud-cpp/google/cloud/storage/internal/bucket_requests.h Lines 37 to 39 in 2214b8c
And use that option to initialize We also need to create the google-cloud-cpp/google/cloud/storage/list_buckets_reader.h Lines 28 to 30 in 2214b8c
defined as: PaginationToken Checkpoint(ListBucketsReader&& reader); // will require move, this is a good thing. Which may require adding another function (in the internal namespace) to extract the token value from: google-cloud-cpp/google/cloud/internal/pagination_range.h Lines 66 to 67 in 2214b8c
Let me know if you have any questions. |
I studied the code and already got few points (I hope so...) So, are you going to incapsulate P.S. Please, tell me, if there is some doc that I can read and get info and not ply you with many question :) |
Great.
I think your reading is correct.
That is the plan.
I think it will be something like this: PaginationToken Checkpoint(ListBucketsReader&& reader) {
return PaginationToken(std::move(reader).CurrentToken());
} But as I write this I realize there are some boundary conditions we need to deal with:
Some of these questions may seem unimportant to you, or not help you. I want to record them here for future reference anyway. Let's tackle each in turn, because I think the are in order of difficulty: What happens to the data already pre-loaded into the reader? That suggests we need to change the return value to something like: std::pair<PaginationToken, std::vector<BucketMetadata>> Checkpoint(ListBucketsReader&& reader); this makes it impossible for the application to "forget" about any data already in the reader. What happens when the reader has not fetched all the data? What is the correct token value. That suggests we need to change class PaginationToken {
public:
// This class should be opaque, application developers should not be able to create them, or access the internal state.
// the only way to get one is calling
private:
friend GetTokenValue;
friend MakePaginationToken;
explicit PaginationToken(std::string value);
explicit PaginationToken(abls::nullopt_t)
abls::optional<std::string> value_;
}; That allows us to use an empty string as the marker for "no page loaded yet" (it always is in the google services), and to use an empty optional as the marker for "all pages loaded, there is no more data". Do we need this extra level of indirection? Why not make I think the answer is "yes", but the rationale is a bit complicated:
This is a new feature. There is no design for a feature that does not exist, you and I are creating the design in this conversation. PS: I apologize in advance if you know most, or some of the stuff I am explaining. I just don't know your level of expertise so I am assuming nothing. If you find that (a) I skipped some steps, please do ask for more details, or (b) if I am telling you stuff you know, please tell me to stop. |
@coryan |
What would I like to propose:
instead
I have already carried some experiments and got some results but I am concerned about design. @coryan, please, can you review this idea and blame/fix me. I hope this idea can be useful! |
It seems to be that it would be easier to just not use That is, instead of this: google-cloud-cpp/google/cloud/internal/pagination_range.h Lines 48 to 49 in 2214b8c
We would write something like: template <typename T>
class PaginationRange { // no longer related to `StreamRange<T>` !!
public:
iterator begin() { ... }
iterator end() { ... }
std::pair<PaginationToken, std::vector<T>> Checkpoint() && { ... }
}; I think this would be a breaking change, but I would like to hear @devjgm's opinion. |
@Ayrtat Are you still working on this? |
Hi! No, I am not. |
Hello @coryan , |
There is no PR for this, though @remyabel expressed an interest. |
I don't mind if someone else was working on it. Just keeping it on the table incase I get around to it and didn't want to duplicate effort. |
Yes @remyabel , |
This may be easier once I introduce a |
We do not have time to work on this, closing. |
This is motivated by #5703. Applications may need to persist the state of pagination iterators, for example, because they are implementing a web service that returns pages of results.
I suggested this API:
I think that this problem applies to any pagination -> iterator mapping.
The text was updated successfully, but these errors were encountered: