Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chain_getBlock extrinsics encoding #1024

Merged

Conversation

tadeohepperle
Copy link
Contributor

@tadeohepperle tadeohepperle commented Jun 21, 2023

fixes #1010.

The scale encoded extrinsic bytes are now only decoded when the ExtrinsicDetails are constructed from the raw bytes (so a bit more down the line and not directly in the RPC call).

@tadeohepperle tadeohepperle marked this pull request as ready for review July 6, 2023 11:49
@tadeohepperle tadeohepperle requested a review from a team as a code owner July 6, 2023 11:49
@@ -445,8 +445,7 @@ where
/// and obtain details about it, once it has made it into a block.
pub async fn submit_and_watch(&self) -> Result<TxProgress<T, C>, Error> {
// Get a hash of the extrinsic (we'll need this later).
let ext_hash = T::Hasher::hash_of(&self.encoded);

let ext_hash = T::Hasher::hash_of(&self.encoded.encode());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this is encoded twice?

Copy link
Contributor Author

@tadeohepperle tadeohepperle Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment. I think it is like this: The self.encoded refers to the raw bytes of the extrinsic. Calling the .encode() on these bytes basically just prefixes it with the compact encoded length, IIRC.

It is also not ideal, that we have to call .encode() twice: once here, and once in https://github.com/paritytech/subxt/blob/master/subxt/src/rpc/rpc.rs#L401

But we can only avoid that by changing the interface of the RPC client. (Either returning the hash as well as the Subscription, or having a watch_extrinsic_raw function that takes the already scale encoded data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather add two separate fields raw_bytes and scale_bytes in both SubmittableExtrinsic and ExtrinsicDetails to avoid this encode()

These bytes are already allocated and can be reused later and it makes the code easier to understand.

Regarding https://github.com/paritytech/subxt/blob/master/subxt/src/rpc/rpc.rs#L401, that is encoding it to sp_core::Bytes so it may work just to pass the raw_bytes into there...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about that, but it will not save us allocations, because inside of the Rpc::watch_extrinsic() function, we need to construct an RpcParams struct that needs to take ownership of the scale encoded bytes. So we need to clone them, because we cannot move them out of the SubmittableExtrinsic. What do you think about having the Rpc::watch_extrinsic() function return the hash of the bytes of the RpcParams? This would avoid any unnecessary allocations. I think I need to talk with James about this issue on Monday, I also don't know if the fix I did is desired in this way.

Copy link
Member

@niklasad1 niklasad1 Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care that much of the allocations to be honest but it would make the code much more readable.

Having code as the snippet below is super hard to understand and feels like hack to me:

let ext_hash = T::Hasher::hash_of(&self.encoded.encode());

Instead of

let ext_hash = T::Hasher::hash_of(&self.raw_bytes);

Then regarding Rpc::watch_extrinsic() the rpc_params! doesn't need to take ownership of the bytes FWIW, it just serializes them under the hood.

So you could just do:

    pub async fn watch_extrinsic(
        &self,
        bytes: &[u8]
    ) -> Result<Subscription<types::SubstrateTxStatus<T::Hash, T::Hash>>, Error> {
        let params = rpc_params![bytes];
        let subscription = self
            .client
            .subscribe(
                "author_submitAndWatchExtrinsic",
                params,
                "author_unwatchExtrinsic",
            )
            .await?;
        Ok(subscription)
    }

Copy link
Member

@niklasad1 niklasad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't know this code good enough but I can't really follow this fix.

I guess because you decode the bytes and store them in ExtrinsicDetails might make
other code/names inconsistent and I can't follow it properly...

@@ -417,12 +420,13 @@ where
pub fn from_bytes(client: C, tx_bytes: Vec<u8>) -> Self {
Self {
client,
encoded_prefixed: tx_bytes.encode(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes provided from eg line 394 are already prefixed with length, so we shouldn't need to do this again I think?

Comment on lines 451 to 452
// Get a hash of the extrinsic (we'll need this later)
// We use the `encoded_prefixed` to calculate the hash, because it has the same hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think the hash should be as it was before; ie the hash of the exact bytes we're sending to the watch_extrinsic rpc call and not a hash of those bytes with another length prefixed.

To check, the hash returned here should be identical to the hash returned from making the submit call below with the same extrinsic; worth trying that to make sure one way or the other :)

@tadeohepperle tadeohepperle force-pushed the tadeo-hepperle-chain-get-block-extrinsic-decoding branch from d39a45c to ebd7360 Compare July 10, 2023 16:11
Comment on lines 218 to 219
// removing the compact encoded prefix:
let extrinsic_bytes: Vec<u8> = Decode::decode(&mut &extrinsic_bytes[..])?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in the .position(..) thing below, we could avoid the allocating by instead decoding the Compact<u64> off the front of the bytes and then seeing how many bytes were consumed in doing this.
(I say "avoid allocating" but this was copying the bytes into an Arc before, so either way you may allocate, but at least have the chocie of exactly where)

This "compact length" value (or in other words an "extrinsic_start_index" could be saved if useful in ExtrinsicDetails so that it's easy to get the rest of the bytes from it when needed.

I'd have to look at the code in more detail to see what the Arc was used for before and whether it's then good to keep it, or whether perhaps the Vec<ChainBlockExtrinsic> in Extrinsics could be instead something like Arc<[ChainBlockExtrinsic]> so that we don't allocate at all when we iter() over the extrinsics :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also give the ExtrinsicDetails an explicit lifetime that is bound by the lifetime of Extrinsics. This way we can degrade the bytes field of ExtrinsicDetails to bytes: &[u8] instead of bytes: Arc<[u8]> where the slice points to the i.. bytes of the Vec of the respective ChainBlockExtrinsic, where i is the compact encoded length. This also avoids allocation. I am not sure though if the Arc was necessary somewhere.

Comment on lines 392 to 394
let Ok(compact) = <Compact<u32>>::decode(&mut &ext.0[..]) else {
return false;
};
Copy link
Collaborator

@jsdw jsdw Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because size_hint is only a "hint" at the size by the docs, I'd prefer that we have a function, something like this:

/// Decode the compact prefix, returning the remainder of the
/// bytes and the decoded value. 
fn strip_compact_prefix(bytes: &[u8]) -> Result<(u64, &[u8]), codec::Error>
    let cursor = &mut &*bytes;
    let val = <Compact<u64>>::decode(cursor)?;
    Ok((val, *cursor))
}

This decodes the value, consuming any bytes used and then we return the value + rest of the bytes (so we are guaranteed to have stripped the right number off the front).

Then in your code you can:

let Ok((compact, rest)) = strip_compact_prefix(&ext.0) else {
    return false;
}
let hash = T::Hasher::hash_of(rest);
hash == self.ext_hash

You could also use .compact_len from the CompactLen trait for more of a guarantee, though since you want the remaining bytes I think the above is probably more straightrforward :)

@@ -389,7 +389,10 @@ impl<T: Config, C: OnlineClientT<T>> TxInBlock<T, C> {
.iter()
.position(|ext| {
use crate::config::Hasher;
let hash = T::Hasher::hash_of(&ext.0);
let Ok((_,stripped)) = strip_compact_prefix(&ext.0)else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let Ok((_,stripped)) = strip_compact_prefix(&ext.0)else {
let Ok((_,stripped)) = strip_compact_prefix(&ext.0) else {

@@ -35,6 +35,12 @@ impl codec::Encode for Encoded {
}
}

pub(crate) fn strip_compact_prefix(bytes: &[u8]) -> Result<(u64, &[u8]), codec::Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe worth a quick doc commanet like

/// Decodes a compact encoded value from the beginning of the provided bytes,
/// returning the value and any remaining bytes.

Copy link
Collaborator

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of tiny nits!

I guess you decided against using lifetimes in ExtrinsicDetails etc to avoid the cloining; I think that's something we could explore as a future PR though anyway, and I think now that this PR doesnt add any additional allocating over the original (and probably just removes a little as it's no longer also needed when decoding now), so yup it's all good!

Copy link
Collaborator

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tadeohepperle tadeohepperle merged commit cd310b9 into master Jul 13, 2023
7 checks passed
@tadeohepperle tadeohepperle deleted the tadeo-hepperle-chain-get-block-extrinsic-decoding branch July 13, 2023 11:35
@jsdw jsdw mentioned this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chain_getBlock extrinsics encoding
4 participants