Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[De]serialize ZId in human readable form #1330

Merged
merged 1 commit into from
Mar 4, 2022

Conversation

SomeoneToIgnore
Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore commented Feb 24, 2022

Attempts to replace the opt_display_serde stub with something more generic, that would work for HashMap<...> to make zenith part from #1286 more serializable

@SomeoneToIgnore SomeoneToIgnore force-pushed the kb/generic-zid-serialize branch 2 times, most recently from 806f56c to 0ee44e8 Compare February 24, 2022 23:35
Copy link
Contributor

@LizardWizzard LizardWizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SomeoneToIgnore SomeoneToIgnore marked this pull request as draft February 25, 2022 22:55
@SomeoneToIgnore SomeoneToIgnore force-pushed the kb/generic-zid-serialize branch 2 times, most recently from 5ac9690 to 656ba10 Compare March 1, 2022 20:36
@SomeoneToIgnore
Copy link
Contributor Author

I've tried to generify opt_display_serde approach multiple different ways:

  1. initial change in this PR, by implementing a new deserializer and serializer for the ZId directly.
    This did not work so well since

    • there are places where we operate bytes and those places get less effective since hex string bytes occupies more than just hex bytes
    • there's no simple way to choose which form the type will serialize into: hex bytes or hex string bytes. For that, you have to write your own serializer. That means that the "hex string" default serialization will be hard to override in the future.
    • current deserialization Visitor implementation cannot deserialize both hex bytes and hex string bytes, breaking with overflow (during safekeeper's deserialization of the compute greeting message). That had caused the initial CI failures and I did not find a good way around it
  2. Previous approach, but with no automatic (de)serialization attempts: a dedicated newtype for the hex byte (de)serialization, while having the hex string as a default.
    Main concerns are the same as in the previous approach and all related to the hex string bytes as a default (de)serialization strategy breaks defaults, is suboptimal and fragile

  3. A dedicated type for hex (de)serialization, used sparingly only where regular [serde(with = "hex")] does not work.
    This is what I've implemented eventually.

Could be so, that the Visitor I have in the PR is easy to implement to deserialize both hex bytes and hex string bytes.
In this case, it's possible to go on with this option, since we don't really serialize hex bytes anywhere, it's compute who does it and we can work with hex string bytes everywhere else.

@SomeoneToIgnore SomeoneToIgnore marked this pull request as ready for review March 1, 2022 20:53
@dhammika
Copy link
Contributor

dhammika commented Mar 3, 2022

Is this a limitation of serde crate? I wonder how other types with fixed byte arrays handle it.

@SomeoneToIgnore
Copy link
Contributor Author

SomeoneToIgnore commented Mar 3, 2022

Sorry, should indeed have put some more context in the issue.
So far, to serialize our ids into hex strings, we've used #[serde(with = "hex")] for plain ZTimelineId, a special opt_display_serde for Option<ZTimelineId> and all other cases were silently serializing those ids as byte arrays.

In the PR, I came to the requirement to (de)serialize a struct that has a field branch_name_mappings: HashMap<String, Vec<(ZTenantId, ZTimelineId)>>, into something human readable such as hex strings, so that type has to be somehow united with Option<ZTimelineId> in the opt_display_serde or its analogue.

I did not want to write another copy-pasted opt_display_serde for another type and did not like the situation enough to decide to fix it somehow better in a more generic fashion.

Here's a way of limitations I've crossed:

  • There's a limitation on serde macro usages, in cases like
#[derive(serde::Serialize, serde::Deserialize)]
struct Something {
    #[serde(with = "foo")]
    foo: Bar,
}

you're tied to the Bar type with the

The given function must be callable as fn<S>(&T, S) -> Result<S::Ok, S::Error> where S: Serializer, although it may also be generic over T

requirement and all the logic for "recursively descending into types" you'll have to write yourself somehow.
It could change eventually: serde-rs/serde#723

That's why we now have opt_display_serde and why nothing current will for for my HashMap case: it would either not compile (if I annotate the field with the map with serde use opt_display_serde or hex), or serialize it as hex bytes which is not readable:

ad50847381e248feaac9876cc71ae418 vs [173,80,132,115,129,226,72,254,170,201,135,108,199,26,228,24]

  • There's a way for Serde to recursively traverse types when (de)serializing, but that is also limited:
  1. Deserialize provides the mechanism to recursively traverse the complex types such as HashMap<String, Vec<..>> and get to the bottom of the ones we're interested at and deserialize it however we like.

This is what I've tried to do first, the Visitor impl in this PR is identical to the one used in that try.
Yet the magic of the Visitor is backed by Deserializer (note -er on the end: Deserialize is something you add on your types in your app, the -er one is usually provided by the libraries such as bincode or serde_json) that knows which logic to use when dealing with complex types.
I could not make my Visitor and Deserialize impl good for both bincode and serde_json: it worked perfectly for hex strings, (de)serializing those, but it could not, at the same time, deserialize hex bytes with bincode and just overflowed.
More fun is in the fact that Safekeeper's ProposerGreeting gets deserialized from bytes that come from compute (the C code).

  1. Serialize, on the other hand, does not have the same Visitor concept and is not generic: you cannot easily tell it to serialize to either hex bytes or hex string, so all id types implicitly are now emitting "different" bytes for bincode if we change the ids to serialize into hex strings instead of hex bytes: bincode will use hex string bytes.
    Very luckily, compute does the serialization for us, but the trap will be there implicitly for somebody to add Serialize onto the struct with such ids.
    In theory, I could start implementing "generic" Serializer's, but that's IMO too painful for such task.

So feels like even if I dig into and find a way to fix the Deserialize impl, this change could bring more confusion than convenience, especially considering that A portion of C code is involved into this.

  • Newtypes are always around, and we can implement whatever Serialize and Deserialize we want for them, so all we need is somewhat ergonomical way to substitute and transfer from one to another.
    At cost of some boilerplate, now we have a Hex* id variants with From impls both sides. The ids are Copy and I doubt that is feasible to change at this point, so feels like a tolerable compromise for a person who wants to get hex strings in arbitrary Rust types.

I might have missed some other ways or could approaching my Serialize/Deserialize crusade wrong, so if you have any better ideas to solve the issue, please share.

Copy link
Contributor

@dhammika dhammika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rust-fu is pretty basic, still in rust rewiring my head phase 😃
That said I find this abstraction a bit confusing. Just looking at HexZid types, it's not immediately clear why they're needed. I gather we're trying to workaround some serde limitation. But I don't know much serde internals to make any meaningful comments here.
Is it possible to add custom serialize/deserialize for specific collection types? I'm not sure what's better.

Copy link
Contributor

@dhammika dhammika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@SomeoneToIgnore
Copy link
Contributor Author

Is it possible to add custom serialize/deserialize for specific collection types? I'm not sure what's better.

It is possible, similar way it's currently done for Option<Id>:
https://github.com/zenithdb/zenith/blob/1d90b1b205023c3bd404de8b361dda69cef6a502/zenith_utils/src/zid.rs#L172-L196

Now for every type you have to write the same two functions, one set for HashMap<String, Id>, another set for HashMap<i32, Id>, third set for HashMap<String, Option<Id>>, ....
New types will again silently fall back to existing serialization (Id -> hex bytes), old annotations won't work (compile) on them.
I'm not aware of the way to express the recursive rule on type level, so not sure how else can we generalize this approach.
The serde issue mentioned above hints me that maybe there's no sane way and it's serde's fault.

And if there's no way to write those [serde(with = functions, I'm not sure it's a good way to address the issue now.
IMO, HexId is better (while not being a good one still): it works for every type and is more explicit then the (de)serializer dealing with two different formats at once.
It might not be convenient to do HexId::from(id) in some cases, but luckily it's more or less ok on our codebase now.

I've tried to document those new types, will do a bit better.

@SomeoneToIgnore SomeoneToIgnore force-pushed the kb/generic-zid-serialize branch from 656ba10 to e9f46e0 Compare March 4, 2022 08:17
@SomeoneToIgnore SomeoneToIgnore force-pushed the kb/generic-zid-serialize branch from e9f46e0 to 835bd1a Compare March 4, 2022 08:22
@SomeoneToIgnore SomeoneToIgnore merged commit 9424bfa into main Mar 4, 2022
@SomeoneToIgnore SomeoneToIgnore deleted the kb/generic-zid-serialize branch March 4, 2022 08:58
@dhammika
Copy link
Contributor

dhammika commented Mar 4, 2022

I'm having a hard time following serde thread. In either case we don't have a backwards compatible migration path right?

@SomeoneToIgnore
Copy link
Contributor Author

Noting is broken now, besides the whole api being wrecked in #1286
Hex wrappers are standalone, all existing code continues to deal with hex bytes, only the places where a hex string are required to be displayed in complex types.
Nothing is broken, in my perception in this PR and the other PR makes all those considerations rather futile 🙂

@dhammika
Copy link
Contributor

dhammika commented Mar 4, 2022

No, I was referring to serde fixing this but that's unknown at this point anyway.

@SomeoneToIgnore
Copy link
Contributor Author

Yes, alas it's safe to assume that's not going to happen reasonably soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants