Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GetKeysSummary Api Signature/protos #4

Merged
merged 1 commit into from
Mar 15, 2023

Conversation

G8XSU
Copy link
Collaborator

@G8XSU G8XSU commented Dec 22, 2022

Add GetKeysSummary Api Signature/protos

This will auto-generate required request/response objects.

This API can be used for computing version difference of keys b/w client and server-side and lazily load only necessary values.

Some notes from previous discussion on this api:
regarding "return everything with a certain key prefix" => This makes for an unbounded query on server-side hence we should be avoiding it and use pagination.

Also note that we could support storing values upto ~5MB. If we started returning value as well in this response, even a 10 item limit might mean 50MB response. So it is best to only load necessary key-values lazily depending on key-version differences.

Note regarding http/rest+protobuf protocol:
Here protobuf is only being used as a binary serialization format for request and response objects.(instead of writing custom binary serialization or json). This helps both client and server to create requests/response like class instances with ease. This payload will be used as binary body in http-rest post requests.

message GetKeysSummaryResponse {

// Fetched versions along with the corresponding keys in the request.
repeated KeyValue key_versions = 1;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: creating a separate struct for KeyVersion instead of re-using KeyValue struct here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say here explicitly that the value will be empty. I think it's OK to reuse the struct.

Copy link
Member

@devrandom devrandom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are my initial thoughts.

we should say somewhere what are the consistency guarantees. for example, we should probably guarantee that the returned KeyValues will have been written at or after the returned globalVersion, and that they may include writes after the returned globalVersion.

// page's GetKeysSummaryResponse.
// It will be used as exclusive_start_key to perform the new query and only keys greater than
// the specified last_evaluated_key will be returned in the response.
optional string last_evaluated_key = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider naming this be pagination_token or similar, to discourage the client from assuming anything about the nature of this field. this will give us flexibility in the future to paginate in any order.

could probably drop the paragraph with exclusive_start_key (it is not defined, and the client doesn't really care)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense,
earlier version of this api was inspired by s3/ddb api.
but this approach is more generic and some of the new aws api's use a similar approach along with google cloud api's.
that best practices guide was also very helpful.

message GetKeysSummaryResponse {

// Fetched versions along with the corresponding keys in the request.
repeated KeyValue key_versions = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say here explicitly that the value will be empty. I think it's OK to reuse the struct.

app/src/main/proto/vss.proto Outdated Show resolved Hide resolved
@devrandom
Copy link
Member

devrandom commented Dec 22, 2022

Also note that we could support storing values upto ~5MB. If we started returning value as well in this response, even a 10 item limit might mean 50MB response. So it is best to only load necessary key-values lazily depending on key-version differences.

I would still provide a paginating Get with values in addition to this, for use cases that have numerous small entries. there may be thousands of them, and it may be pretty slow to get them one at a time due to round-trip time. the pagination could be sensitive to the item size.

// page's GetKeysSummaryResponse.
// It will be used as exclusive_start_key to perform the new query and only keys greater than
// the specified last_evaluated_key will be returned in the response.
optional string last_evaluated_key = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -98,6 +98,64 @@ message PutObjectRequest {
message PutObjectResponse {
}

message GetKeysSummaryRequest {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would name this using List instead of Get if the response can be paginated. Probably something like ListObjectVersionsRequest to go with GetObjectRequest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ListKeyVersions

@@ -98,6 +98,64 @@ message PutObjectRequest {
message PutObjectResponse {
}

message GetKeysSummaryRequest {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add message-level docs? I don't think it's obvious without them.

// page's GetKeysSummaryResponse.
// It will be used as exclusive_start_key to perform the new query and only keys greater than
// the specified last_evaluated_key will be returned in the response.
optional string last_evaluated_key = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also want to allow users to limit how many to return, as per best practices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense, added a field for this.

// Use this value to start a new GetKeysSummary operation, excluding this value in the new result
// by specifying this last_evaluated_key in the next request.
//
// If last_evaluated_key is empty, then the "last page" of results has been processed and there is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty or not set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some clarification.

Since this is a response object, we want to be as specific as possible, so that it is easier for client to understand. I don't want client to have to think whether it will be empty or not set (even though they are kinda same in protobuf)

Whereas in request structs, we want to be more forgiving of client's usage, both empty and "not set" serve the same purpose and even if client uses one or the other, things should just work and server will handle it.

Comment on lines 153 to 170
// global_version is a sequence-number/version of the whole store.
//
// global_version is only returned in response for the first page of GetKeysSummaryResponse.
optional int64 global_version = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the global version is updated on the server before the call for the next page? Will the request fail?

Copy link
Collaborator Author

@G8XSU G8XSU Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, not really.

So the current correct usage of this api is:

  1. read global_version from first response, store it in local_variable.
  2. apply all the key versions. (load objects lazy or eager)
  3. then apply the global_version from the previously stored local_variable.

i should probably add this in doc on server-side as well, to help in usage.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, not really.

So the current correct usage of this api is:

  1. read global_version from first response, store it in local_variable.
  2. apply all the key versions. (load objects lazy or eager)

Could you describe the lazy case? Is it essentially marking the client-side version as out-of-date, requiring it to be fetched when it needs to read next?

  1. then apply the global_version from the previously stored local_variable.

i should probably add this in doc on server-side as well, to help in usage.

Hmmm... I'm not sure of this addressed my concern around pagination. Couldn't the second page correspond to a different global version if another client updated the global version after fetching the first page but before fetching the second?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, since there's no way to do a batch-get of the entire store, there's no hope for getting a globally consistent snapshot incrementally, right? so perhaps it doesn't matter that each page may have a different global version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it essentially marking the client-side version as out-of-date, requiring it to be fetched when it needs to read next?

Yes, this is optional but on client side we can store the new-version-number and old-version-number. On get of key, if there is a diff, we can make a service call and fetch latest instead of returning the local version.

Couldn't the second page correspond to a different global version if another client updated the global version

The global version is only returned in first page of results. With that being said, there still exists a scenario where globalVersion/store is updated after first page's fetch. This constitutes a race-condition b/w ListKeyVersions on device-1 and put-operation on device-2.
In such a scenario, it is ensured that the api is used safely if client follows the above protocol.

I.e., the put requests and global-versioning still maintains correct versioning and does not let the primary store to get corrupted. (the api design decisions are made to enforce/encourage correct and safe usage by the client.)

It does not guarantee that client has fully updated view of store in all scenarios after refresh, it makes a best-effort to bring client to fully-updated state. In case of race-conditions, we make sure that versioning scheme is not compromised and subsequent refresh will bring client up-to-date. In case where client has slightly out-of-date view of key-versions, subsequent put-requests will fail and client will have to refresh.

Current Api contract is as following: "Here are the minimum key-versions by this globalVersion. Which is globally consistent snapshot at instance of time when store's globalVersion was this"

I did not want to return global-version in every page, since client could easily end up misusing that api.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be a bit more specific, here is my understanding:

  • each page is internally consistent at a certain global version
  • a later page can have a greater-than-or-equal global version to the previous page (i.e. monotonically increasing)

is that what you had in mind?

Copy link

@jkczyz jkczyz Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not guarantee that client has fully updated view of store in all scenarios after refresh, it makes a best-effort to bring client to fully-updated state. In case of race-conditions, we make sure that versioning scheme is not compromised and subsequent refresh will bring client up-to-date. In case where client has slightly out-of-date view of key-versions, subsequent put-requests will fail and client will have to refresh.

Few things came to mind:

  • Is this robust enough to handle the case where the put-operation on device-2 adds data for a new key that would have been returned to device-1 in the first page?
  • Will the server essentially exclude that item from the second page?
  • For lightning, does that leave us in a state for instance when device-2 may open a new channel but device-1 may not know it exists for making a payment until either a refresh or an incoming payment? Or for the latter would it fail because the channel is unknown (i.e., it won't bother querying the server)?

Copy link
Collaborator Author

@G8XSU G8XSU Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devrandom
Not exactly, only first page contains the globalVersion. And the key-versions are guaranteed to be correct only for that global version. Since client would be using that global version in put requests, it will only work if there were no further updates during or after this api request.

@jkczyz

Is this robust enough to handle the case where the put-operation on device-2 adds data for a new key that would have been returned to device-1 in the first page?

Yes, since client would be using an out-of-date globalVersion, a client-device not having updated view will not be able to write successfully.

Will the server essentially exclude that item from the second page?

Server does not do anything special, globalVersion ensures that client has complete and updated view. If some change has been made during or after this api, then client's globalVersion is rendered out-of-date.

For lightning, does that leave us in a state for instance when device-2 may open a new channel but device-1 may not know it exists for making a payment until either a refresh or an incoming payment? Or for the latter would it fail because the channel is unknown (i.e., it won't bother querying the server)?

The key is, client is not able to perform any changes or updates if they don't have fully-updated view.
If the primary state has had any change, client-device needs to update/refresh to proceed with any operations.
This refresh could be triggered because of unrelated change or some periodic refresh on client-side.

Client-device could also ensure that they have latest state by just querying and comparing the globalVersion.
Depending on implementation, this could be triggered upfront i.e. before doing the actual work/operation.

it won't bother querying the server

This cannot happen because in order to use anything or make any save, client would need to have fully updated view.
Current process enforces client to have up-to-date view of LN state-machine.

Note that this multi-device usage is intended for end-user devices and not for concurrently running routing nodes of same LN-state machine. In these cases, we do expect some contention and overhead but our main goal is to ensure safe state machine transitions. And since these points of parallel use are user-facing, we expect them to be occasional but at the same time not very high frequency.

Expect context switching to be from one device to another, and in worst case back to same device. But we should not expect back and forth again and again b/w same devices within a short interval. (Even though all transitions are ensured to be safe, there will be some overhead when switching and that should be fine)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since client would be using an out-of-date globalVersion, a client-device not having updated view will not be able to write successfully.

Server does not do anything special, globalVersion ensures that client has complete and updated view. If some change has been made during or after this api, then client's globalVersion is rendered out-of-date.

My question here was not around writing but rather reading subsequent pages. Say device-1 gets page-1 with:

  • key-1
  • key-2
  • key-4

Then device-2 puts key-3. Would device-1 get key-3 on when retrieving page-2?

The key is, client is not able to perform any changes or updates if they don't have fully-updated view. If the primary state has had any change, client-device needs to update/refresh to proceed with any operations. This refresh could be triggered because of unrelated change or some periodic refresh on client-side.

But say the client just wants to form a payment path using their channels first hop candidates. No write is involved. So they would have an outdated view of the channels (i.e., missing the newly opened channel). I see that if they actually go through and try to make the payment using the path --which would involve a write -- it would fail because of the global version is out of date. But after fetching the data, would they just go through with making the payment? Or would they start over with path finding now that they are aware of the previously missing channel? This is more of a question for users of VSS rather than how it is implemented, I suppose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would device-1 get key-3 on when retrieving page-2?

It might or it might not, depending on DB implementation being used and key ordering. But it shouldn't matter according to the current process. (In both cases, a refresh is still required.)

Second question

That is an implementation detail and yes it depends on how vss is being used and refreshed.
One way to go about it is, client will have some buffer window for which it will assume that it has the latest view.
If client-state was refreshed within T-x time, assume it is latest and proceed.

// In case of refreshing complete key-version view on the client-side, correct usage for
// the returned global_version is as following:
// 1. Read global_version from the first page of paginated response, store it as local_variable.
// 2. Update all the key_versions on client-side from the paginated response.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also mean calling GetObject for each key? It isn't clear from my reading of this that it is required though it seems implied from my understanding of global versioning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comment above about globally consistent snapshot

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,
lets say client makes a put operation and it fails due to conflictException. Now client wishes to refresh their view of storage.
This can be done by computing the diff in versions and downloading the updated keys in lazy or eager fashion.

Since only few of the keys need to be updated to ensure fully-updated view, only those keys will be fetched.

Comment on lines 167 to 168
// This ensures that on client-side, we can guarantee that all current key_versions are at least
// from the corresponding global_version. This guarantee is helpful for ensuring the versioning
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "current key_versions" mean only those returned by the first page (as this documentation is outlining)? Or for all pages, which seems to imply the docs are discussion fetching more than the first page? If the latter, seems like the above steps should be explicit about fetching subsequent pages.

Copy link
Collaborator Author

@G8XSU G8XSU Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for all pages,
since the key versions returned in all pages are atleast from globalVersion returned in first page.

I thought it was explicit in "from the paginated response" but reworded it for clarification.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/response/responses

Since there is more than one response, each with a single page. 🙂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we say paginated, its generally a "paginated response" even though it refers to multiple pages.

@G8XSU G8XSU requested a review from devrandom January 12, 2023 20:31
@@ -98,6 +98,79 @@ message PutObjectRequest {
message PutObjectResponse {
}

message ListKeyVersionsRequest {

// store_id is a keyspace identifier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: might make sense to move repeating doc comments to the top of the file, and just link to them in each request proto.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i want to do that but there wasn't a clear and nice way to do it.
If we do it the current way, protobuf generated classes have nicely added field docs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i want to do that but there wasn't a clear and nice way to do it. If we do it the current way, protobuf generated classes have nicely added field docs.

Are you saying any top-level docs are not copied anywhere in the generated code? In my past experience, the generated code didn't have any docs at all, so developers would just look at the proto file. But perhaps this has changed.

We should ideally end off in a place where the reader has a high-level summary somewhere and have more detailed explanations as they drill down into messages and fields, IMO. Not sure how this will ultimately look, but I could imagine a few possibilities:

  • High-level summary at top, minimal request/response docs, detailed field docs
  • High-level summary at top, detailed request/response docs, minimal field docs
  • No high-level summary, high-level summary in request/response docs, detailed request/response docs

Regardless of which, we should have some high-level summary somewhere so developers don't have to exhaustively read every field doc to know how everything fits together. Feel free to do in a follow-up, of couse.

Copy link
Collaborator Author

@G8XSU G8XSU Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying any top-level docs are not copied anywhere in the generated code?

Yes, they get omitted, only field and struct level docs get into generated code.

My current plan is
"High-level summary in project docs, No high-level summary in proto, high-level summary in request/response docs, detailed field docs"

In this way, docs on top of fields and structs will get copied to generated code, and high-level project doc serves as reference for usage. It will also contain examples from client-side code.
Will take it as follow-up.

app/src/main/proto/vss.proto Outdated Show resolved Hide resolved
//
// In case of refreshing complete key-version view on the client-side, correct usage for
// the returned global_version is as following:
// 1. Read global_version from the first page of paginated response, store it as local_variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to give a name to the variable - i.e. can just say "and save it."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 changed it to "and save it as local variable."

app/src/main/proto/vss.proto Outdated Show resolved Hide resolved
// 2. Update all the key_versions on client-side from all the pages of paginated response.
// 3. Update global_version on client_side from the local_variable stored in step-1.
// This ensures that on client-side, we can guarantee that all current key_versions are at least
// from the corresponding global_version. This guarantee is helpful for ensuring the versioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"are at least from the corresponding global_version" -> "were stored at global_version or later"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

app/src/main/proto/vss.proto Outdated Show resolved Hide resolved
@G8XSU G8XSU requested a review from devrandom January 25, 2023 19:42
@G8XSU
Copy link
Collaborator Author

G8XSU commented Mar 15, 2023

@devrandom can you take an another look at this.

@G8XSU G8XSU merged commit 06a1a81 into lightningdevkit:main Mar 15, 2023
@G8XSU G8XSU mentioned this pull request May 10, 2023
31 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants