Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: New version of entity_key serDe #4283

Closed
HaoXuAI opened this issue Jun 16, 2024 · 0 comments
Closed

Feat: New version of entity_key serDe #4283

HaoXuAI opened this issue Jun 16, 2024 · 0 comments
Labels
kind/feature New feature or request

Comments

@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Jun 16, 2024

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

The current entity_key serDe (version 2) is below:

def serialize_entity_key(
    entity_key: EntityKeyProto, entity_key_serialization_version=1
) -> bytes:
    """
    Serialize entity key to a bytestring so it can be used as a lookup key in a hash table.

    We need this encoding to be stable; therefore we cannot just use protobuf serialization
    here since it does not guarantee that two proto messages containing the same data will
    serialize to the same byte string[1].

    [1] https://developers.google.com/protocol-buffers/docs/encoding
    """
    sorted_keys, sorted_values = zip(
        *sorted(zip(entity_key.join_keys, entity_key.entity_values))
    )

    output: List[bytes] = []
    for k in sorted_keys:
        output.append(struct.pack("<I", ValueType.STRING))
        output.append(k.encode("utf8"))
    for v in sorted_values:
        val_bytes, value_type = _serialize_val(
            v.WhichOneof("val"),
            v,
            entity_key_serialization_version=entity_key_serialization_version,
        )

        output.append(struct.pack("<I", value_type))

        output.append(struct.pack("<I", len(val_bytes)))
        output.append(val_bytes)

    return b"".join(output)

e.g, for sorted_keys = {tuple: 1} item_id and sorted_values = {tuple: 1} int64_val: 1\n will give output:
[b'\x02\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']

This makes deserialization not doable. In order to deserialize we can append the "length" of value to the join_key, such as for the same test key and value we can get the output:
[b'\x02\x00\x00\x00', b'\x07\x00\x00\x00', b'item_id', b'\x04\x00\x00\x00', b'\x08\x00\x00\x00', b'\x01\x00\x00\x00\x00\x00\x00\x00']

Then we can deserialize the bytes to proto.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@HaoXuAI HaoXuAI added the kind/feature New feature or request label Jun 16, 2024
@tokoko tokoko closed this as completed Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants