-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine API: add getPayloadBodiesByRangeV1
to #146
#218
Conversation
Similar to the consensus p2p spec, a by-range request allows execution clients to store finalized blocks in linear by-number storage instead of relying on by-hash indices, significantly increasing efficiency in fetching them from cold storage. Clients whose database design does not permit efficient by-number lookups may opt to not implement this call, but must then give a well-known error code allowing consensus later clients to fall back to a less efficient method of fetching the blocks. This specification assumes that execution clients know nothing of slot numbers as seen on the consensus layer. Should execution clients later learn about these, the specification may be amended to work with slot numbers instead. Until then, consensus clients must be careful to compute block numbers correctly. Consensus clients must also be careful when this request is used to fetch non-finalized blocks, reverting to by-root requests if an unexpected chain is returned.
getPayloadBodiesByRangeV1
to #146
getPayloadBodiesByRangeV1
to #146getPayloadBodiesByRangeV1
to #146
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments.
One recent addition worth considering is timeouts. I think we should modify Timeouts section to say that if timeout
parameter isn't specified, CL has a liberty to decide how much time it should wait for response. But in this particular case I think that timeouts matter as these requests may be time consuming as they require disk access.
Also, we may consider a limitation on a number of block bodies in the response. Say something like 1024
as a hard cap to prevent buggy CL making EL read and send back a million of blocks.
src/engine/specification.md
Outdated
* result: `Array of ExecutionPayloadBodyV1` - Array of [`ExecutionPayloadBodyV1`](#ExecutionPayloadBodyV1) objects. | ||
* error: code and message set in case an exception happens while processing the method call. | ||
* Clients that don't support fetching bodies by range **MUST** return the error code `-32601 Method not found The method does not exist / is not available.`. Callers may then revert to `engine_getPayloadBodiesByRootV1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving this statement to the Specification section of this method
Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com>
Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com>
Thanks!
In the consensus layer, we have to start responding respond within 5s, and the entire response must take no longer than 10s in total which effectively puts a few caps on request sizes - if we can't serve a response within this time, it's likely useless. I haven't followed the rationale for adding timeouts to this spec in particular, but if we're going to add it, these are the values to take into consideration.
On the CL side, we "soft limit" requests to 1024 slots, meaning that anyone that requests more will not get a "full" answer. Since the EL/CL is more of trusted connection, we could also opt for a hard limit where the server gives an error if more is requested - I agree it's useful to add this as a coordination point so that clients don't make outrageous requests. I'd probably go with a soft limit - I'll put that and 1024 in the next edit unless someone comes along with an opinion :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Agree with @mkalinin's comments
Also, summoning @paulhauner to ensure that this fits with the deduplication engineering they've been working on in lighthouse
#### Specification | ||
|
||
1. Given a `start` and a `count`, the client software **MUST** respond with array of `ExecutionPayloadBodyV1` objects with the corresponding execution block number respecting the order of blocks in the canonical chain, as selected by the latest `forkChoiceUpdated` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add an error condition for if the EL does not have the requisite data (either a malformed CL request or some sort of failure/resync on EL side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated response to give nil
when blocks are missing
src/engine/specification.md
Outdated
1. Given array of block hashes client software **MUST** respond with array of `ExecutionPayloadBodyV1` objects with the corresponding hashes respecting the order of block hashes in the input array. | ||
|
||
1. Client software **MUST** skip the payload body in the response array if the data of this body is missing. For instance, if the request is `[A.block_hash, B.block_hash, C.block_hash]` and client software has data of payloads `A` and `C`, but doesn't have data of `B`, the response **MUST** be `[A.body, C.body]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think thats a bit weird, if I request n-blocks I would expect n-blocks even if some blocks are nil.
So I would rather return [A.body, nil, C.body]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this raises a question: should CL:s verify the hash of the data? including nil
here alleviates CL:s of this requirement (which they otherwise have to to, to match response to request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favour of including the nil
s, as it's conceptually cleaner to have one response per request, and I think CL verification should be optional (although we should probably enable it by default, especially initially).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only I'm in favor of including nil
but I think CL software should be mandated to check the hashes or do some verification that the data was not corrupted in the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CL software should be mandated
the only viable way to mandate it is by making it technically infeasible to not check - if we mandate it, we should also reap the benefits of having the hash check there (and use a more compact response that potentially can be reordered) - I see this mostly as a trust decision - if we include nil
, we're saying that the CL fully trusts the EL and at that point, I don't think it's reasonable to mandate verification because it makes serving block requests more expensive for little gain (ie no trust gain, tiny chance of detecting corruption that whoever did the block request on the far end will have to repeat anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil
has been added as a required response for missing blocks - consumers may or may not verify the payload - this is up to their own quality policy - when they are sending the EL contents to someone on the internet, that someone will have to verify it regardless, so it's a bit of a waste to redo it, assuming they trust their EL.
|
||
1. Client software **MUST** skip the payload body in the response array if the data of this body is missing. For instance, if the request is `[A.block_hash, B.block_hash, C.block_hash]` and client software has data of payloads `A` and `C`, but doesn't have data of `B`, the response **MUST** be `[A.body, C.body]`. | ||
|
||
1. Clients that support `engine_getPayloadBodiesByRangeV1` **MAY NOT** respond to requests for finalized blocks by hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because they may have pruned them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as CL really - this allows us to drop hash-based indices and retrieve blocks from linear archival storage - the aim with this clause is to ensure that CL:s use the linear request for prefinality blocks and only "top up" with by-hash requests where forks are possible.
There is a rough consensus to attempt to include these methods into Shanghai, if there is any Shanghai delay related to implementation of proposed changes then we can reconsider these methods for a later inclusion. @arnetheduck would you mind to rebase this PR with the most recent changes from main, it implies adding these new methods into |
Rebased, refined and moved to #352 |
This PR extends #146 to also specify a by-range request.
Similar to the consensus p2p spec, a by-range request allows execution
clients to store finalized blocks in linear by-number storage instead of
relying on by-hash indices, significantly increasing efficiency in
fetching them from cold storage.
Clients whose database design does not permit efficient by-number
lookups may opt to not implement this call, but must then give a
well-known error code allowing consensus later clients to fall back to a
less efficient method of fetching the blocks.
This specification assumes that execution clients know nothing of slot
numbers as seen on the consensus layer. Should execution clients later
learn about these, the specification may be amended to work with slot
numbers instead.
Until then, consensus clients must be careful to compute block numbers
correctly.
Consensus clients must also be careful when this request is used to
fetch non-finalized blocks, reverting to by-root requests if an
unexpected chain is returned.