Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not hex-format primitives when format doesn't require it #454

Merged
merged 12 commits into from
May 24, 2023

Conversation

Dentosal
Copy link
Member

@Dentosal Dentosal commented May 22, 2023

Uses is_human_readable feature of serde. At least postcard which we're currently using for db benefits from this behavior. Works towards FuelLabs/fuel-core#1085

@Dentosal Dentosal added the fuel-types Related to the `fuel-types` crate. label May 22, 2023
@Dentosal Dentosal self-assigned this May 22, 2023
@Dentosal Dentosal requested a review from xgreenx May 22, 2023 22:35
@Dentosal Dentosal marked this pull request as ready for review May 22, 2023 22:35
@Dentosal Dentosal requested a review from a team May 22, 2023 22:55
@Voxelot
Copy link
Member

Voxelot commented May 23, 2023

Should we have unit / snapshot tests to verify that roudtrip serialization works when is_human_readable is enabled / disabled?

@Dentosal
Copy link
Member Author

Should we have unit / snapshot tests to verify that roudtrip serialization works when is_human_readable is enabled / disabled?

Yep. Those are already tested for in fuel-core, but it makes sense to have them tested here as well.

Voxelot
Voxelot previously approved these changes May 23, 2023
serializer.serialize_str(&format!("{:x}", &self))
} else {
let mut seq = serializer.serialize_tuple($s)?;
for elem in self.0 {
Copy link
Collaborator

@xgreenx xgreenx May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here and during deserialization is that you will iterate over each element and serialize/deserialize it separately. It is much slower than copying bytes into/from the slice.

We need to optimize that part for our primitives because we use them everywhere.

Maybe the optimizer already is clever enough to optimize it for arrays, but we need to prove it with benchmarks.

For example, you can reuse implementation form the OptimizedContract to compare with your implementation

Copy link
Member Author

@Dentosal Dentosal May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way is what serde docs suggests to do with fixed size arrays.

You're right that the iteration here is slower than a slice copy; and indeed switching to serialize_bytes speeds this up quite a bit (4x by my microbenchmark). However, with the currently used postcard format, serialize_bytes prefixes the array with it's length in the serialized form, making it larger than necessary. Some other formats (like bincode) do seem not do this, but bincode is much slower otherwise. I wish there was a serde format that had both of these figured out, but I haven't seen that yet.

In any case, I think having a single extra byte encoded for each of the primitives isn't ideal, but we cannot afford the slow down that comes with doing it right, so I'm updating this to use serialize_bytes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also try to bench it with zerovec, maybe for serialize_bytes, it doesn't use length

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll do that later today

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after reading through zerovec docs, it's not apparent to me how zerovec would help us. zerovec is not a serialization format, but rather a set of types to assist with zero-copy deserialization using serde formats. Converting these types to use zerovec would make them less ergonomic to use, and creating new instances would require allocation, as zerovec doesn't do stack-allocated arrays.

What we could do instead is to use Cow<'a, [u8; SIZE]> for the array inside the type instead. This means that the type takes a lifetime argument. Sadly that breaks over 200 different locations where the types are used, and I don't necessarily feel like spending a whole day just adding lifetimes to them in hopes that it would be useful. That would also break things like AsMut trait impl, which is used in many many places as well.

These types are so cheap to copy that zero-copying them makes almost no sense anyways; I think we should instead focus on using zero-copy for the expensive types like Contract.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SCALE, they don't put additional bytes for arrays=) Because the size is known

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is SCALE?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substrate library for serialization and deserialization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to have open correctness issues, but otherwise looks nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the 0-prefixing is outside of our control from the serde impl perspective. We don't really have a way around this since we use postcard downstream, and so avoiding this overhead would require using an entirely different serde adapter (which is outside the scope of this PR imo)

Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the CHANGELOG.md, please?

@@ -19,6 +19,7 @@ serde = { version = "1.0", default-features = false, features = ["derive", "allo
bincode = { workspace = true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe in the other tests we need to use postcard instead of bincode too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we're using both postcard and bincode in our codebase. Tests should use the codec the feature is using, and generally the codec should be an internal implementation detail that's not directly tested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, where do we use bincode? I can't find any usage in the fuel-core

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh apparently we don't anymore. In that case the tests should be updated as well. I'll do a follow-up PR, since that's out-of-scope for this.

@Dentosal
Copy link
Member Author

Changelog updated.

@Dentosal Dentosal added this pull request to the merge queue May 24, 2023
Merged via the queue into master with commit 9b91a19 May 24, 2023
@Dentosal Dentosal deleted the dento/serde-human-readability branch May 24, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuel-types Related to the `fuel-types` crate.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants